首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在AWS EMR上使用sbt程序集在scala中创建单个jar?运行到不重复状态:在以下中找到不同的文件内容:

如何在AWS EMR上使用sbt程序集在scala中创建单个jar?运行到不重复状态:在以下中找到不同的文件内容:
EN

Stack Overflow用户
提问于 2020-05-29 06:26:05
回答 1查看 547关注 0票数 1

我在AWS EMR集群中,我刚刚站了起来,编译了scala文件,并希望将其构建为程序集。但是,当我发出sbt程序集时,会遇到去重复错误。

根据https://medium.com/@tedherman/compile-scala-on-emr-cb77610559f0,我最初有一个符号链接,将我的库链接到usr火花罐;

代码语言:javascript
复制
ln -s /usr/lib/spark/jars lib

虽然我已经注意到我的代码通过了一个sbt编译,不管有没有这样。但是,我不明白为什么/如何解决sbt程序集欺骗错误。我还将注意到,按照本文的规定,我在引导操作中安装了sbt。

具有中的符号链接

有些判断似乎是明确的、精确的欺骗;例如:

代码语言:javascript
复制
[error] deduplicate: different file contents found in the following:
[error] /home/hadoop/.ivy2/cache/org.apache.parquet/parquet-jackson/jars/parquet-jackson-1.10.1.jar:shaded/parquet/org/codehaus/jackson/util/CharTypes.class
[error] /usr/lib/spark/jars/parquet-jackson-1.10.1-spark-amzn-1.jar:shaded/parquet/org/codehaus/jackson/util/CharTypes.class

其他的似乎是相互竞争的版本;

代码语言:javascript
复制
[error] deduplicate: different file contents found in the following:
[error] /home/hadoop/.ivy2/cache/org.apache.spark/spark-core_2.11/jars/spark-core_2.11-2.4.3.jar:org/spark_project/jetty/util/MultiPartOutputStream.class
[error] /usr/lib/spark/jars/spark-core_2.11-2.4.5-amzn-0.jar:org/spark_project/jetty/util/MultiPartOutputStream.class

我不明白为什么会有相互竞争的版本;或者是默认的版本,或者是我做了什么来介绍它们。

没有符号链接的

我想,如果我去掉这个,我会有更少的问题,虽然我仍然有欺骗(只是较少);

代码语言:javascript
复制
[error] deduplicate: different file contents found in the following:
[error] /home/hadoop/.ivy2/cache/org.apache.hadoop/hadoop-yarn-api/jars/hadoop-yarn-api-2.6.5.jar:org/apache/hadoop/yarn/factory/providers/package-info.class
[error] /home/hadoop/.ivy2/cache/org.apache.hadoop/hadoop-yarn-common/jars/hadoop-yarn-common-2.6.5.jar:org/apache/hadoop/yarn/factory/providers/package-info.class

我不明白上面为什么是一个陷阱,考虑到一个是hadoop纱线-api-2.6.5.jar,另一个是hadoop-纱线-公共-2.6.5.jar。不同的名字,为什么被骗?

其他似乎是版本;

代码语言:javascript
复制
[error] deduplicate: different file contents found in the following:
[error] /home/hadoop/.ivy2/cache/javax.inject/javax.inject/jars/javax.inject-1.jar:javax/inject/Named.class
[error] /home/hadoop/.ivy2/cache/org.glassfish.hk2.external/javax.inject/jars/javax.inject-2.4.0-b34.jar:javax/inject/Named.class

有些有相同的文件名,但路径/罐子不同.

代码语言:javascript
复制
[error] deduplicate: different file contents found in the following:
[error] /home/hadoop/.ivy2/cache/org.apache.arrow/arrow-format/jars/arrow-format-0.10.0.jar:git.properties
[error] /home/hadoop/.ivy2/cache/org.apache.arrow/arrow-memory/jars/arrow-memory-0.10.0.jar:git.properties
[error] /home/hadoop/.ivy2/cache/org.apache.arrow/arrow-vector/jars/arrow-vector-0.10.0.jar:git.properties

这些也一样..。

代码语言:javascript
复制
[error] deduplicate: different file contents found in the following:
[error] /home/hadoop/.ivy2/cache/org.apache.spark/spark-catalyst_2.11/jars/spark-catalyst_2.11-2.4.3.jar:org/apache/spark/unused/UnusedStubClass.class
[error] /home/hadoop/.ivy2/cache/org.apache.spark/spark-core_2.11/jars/spark-core_2.11-2.4.3.jar:org/apache/spark/unused/UnusedStubClass.class
[error] /home/hadoop/.ivy2/cache/org.apache.spark/spark-graphx_2.11/jars/spark-graphx_2.11-2.4.3.jar:org/apache/spark/unused/UnusedStubClass.class

作为参考,其他一些信息

导入我的scala对象

代码语言:javascript
复制
import org.apache.spark.sql.SparkSession
import java.time.LocalDateTime
import com.amazonaws.regions.Regions
import com.amazonaws.services.secretsmanager.AWSSecretsManagerClientBuilder
import com.amazonaws.services.secretsmanager.model.GetSecretValueRequest
import org.json4s.{DefaultFormats, MappingException}
import org.json4s.jackson.JsonMethods._
import com.datarobot.prediction.spark.Predictors.{getPredictorFromServer, getPredictor}

我的build.sbt

代码语言:javascript
复制
libraryDependencies ++= Seq(
  "net.snowflake" % "snowflake-jdbc" % "3.12.5",
  "net.snowflake" % "spark-snowflake_2.11" % "2.7.1-spark_2.4",
  "com.datarobot" % "scoring-code-spark-api_2.4.3" % "0.0.19",
  "com.datarobot" % "datarobot-prediction" % "2.1.4",
  "com.amazonaws" % "aws-java-sdk-secretsmanager" % "1.11.789",
  "software.amazon.awssdk" % "regions" % "2.13.23"
) 

有什么想法?请给我建议。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-05-29 13:45:16

您将需要一个mergeStrategy设置(文档)。

“随机示例:”

代码语言:javascript
复制
assemblyMergeStrategy in assembly := {
  case PathList("META-INF", _)       => MergeStrategy.discard
  case PathList("git.properties", _) => MergeStrategy.discard
  case "application.conf"            => MergeStrategy.concat
  case "reference.conf"              => MergeStrategy.concat
  case _                             => MergeStrategy.first
}
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62079887

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档