我是Scala/Spark的新手,所以请不要对我太苛刻:)
我正试图在亚马逊网络服务上运行一个电子病历集群,运行我用sbt package打包的jar文件。当我在本地运行代码时,它工作得非常好,但当我在AWS EMR集群中运行它时,我得到了一个错误:
ERROR Client: Application diagnostics message: User class threw exception: java.lang.NoClassDefFoundError: upickle/core/Types$Writer
据我所知,这个错误源于scala/spark版本的依赖关系。
所以我使用Scala 2.12和spark 3.0.1,在AWS中我使用emr-6.2.0。
这是我的build.sbt:
scalaVersion := "2.12.14"
libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.792"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-core" % "1.11.792"
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "3.3.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "3.3.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "3.3.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.0.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.0.1"
libraryDependencies += "com.lihaoyi" %% "upickle" % "1.4.1"
libraryDependencies += "com.lihaoyi" %% "ujson" % "1.4.1"我遗漏了什么?
谢谢!
发布于 2021-09-15 07:58:11
如果您使用sbt package,那么生成的jar将只包含项目的代码,而不包含依赖项。您需要使用sbt assembly来生成所谓的uberjar,它也将包含依赖项。
但在您的情况下,建议将Spark和Hadoop (可能还有AWS)依赖项标记为Provided -它们应该已经包含在电子病历运行时中。使用类似如下的内容:
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.0.1" % Providedhttps://stackoverflow.com/questions/69188747
复制相似问题