我正在使用AWS ide和language scala,我想使用IAM用户凭证访问存储在IntelliJ S3中的一个文本文件。我没有仅仅使用依赖项在我的系统上下载Hadoop。我已经使用Aws依赖关系和jets3t依赖关系做到了这一点。但我想用spark来做。
我得到的基本错误是:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found,
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3.S3FileSystem not found
// and similarly
Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found.请告诉我该如何解决这个问题。
我试着独立添加各种Hadoop依赖项,hadoop-aws依赖项,每个都给了我一个不同的错误。就像添加"org.apache.hadoop“% "hadoop-aws”% "3.2.0“一样:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities
at java.lang.ClassLoader.defineClass1(Native Method)添加"org.apache.hadoop“% "hadoop-common”% "3.1.1“表示:
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.SparkContext.withScope(SparkContext.scala:699)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:828)
at spark_scala_s3$.main(spark_scala_s3.scala:40).
for the line:
val df = sc.textFile(s"s3a://my-week6-spark/$path").添加"org.apache.hadoop“% "hadoop-aws”% "2.7.3“表示:
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400,
AWS Service: Amazon S3, AWS Request ID: 90C42E72BEEB31FB, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: (someid).
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: AKIAUAOZGFGM2K5WEHFC:l0IHiYq4ApJEewbjKR00KwKA+Ra)
and
The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.代码:
// Dependencies in sbt:
version := "0.1"
scalaVersion := "2.12.8"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3"
// and others I added to see what works but gave errors as above实际的scala文件代码
// just 1 import
import org.apache.spark.sql.SparkSession
val s = SparkSession.builder().appName("trial2").master("local").getOrCreate()
val sc = s.sparkContext
var accessKeyId: String = "acc key"
val secretAccessKey: String = "secret acc key"
sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", accessKeyId)
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", accessKeyId)
sc.hadoopConfiguration.set("fs.s3a.access.key", accessKeyId)
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey",
secretAccessKey)
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey",
secretAccessKey)
sc.hadoopConfiguration.set("fs.s3a.secret.key",secretAccessKey)
sc.hadoopConfiguration.set("fs.s3n.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc.hadoopConfiguration.set("fs.s3a.impl",
"org.apache.hadoop.fs.s3a.S3AFileSystem")
sc.hadoopConfiguration.set("fs.s3.impl",
"org.apache.hadoop.fs.s3.S3FileSystem")
try{
val df = sc.textFile(s"s3a://my-week6-spark/$path")
// my-week6-spark is bucket name
// path consist of file name
println("DF.show() 1\n",df.collect())
catch {
case exception: Exception => println("1 failed as
",exception)
}
try{
val df = sc.textFile(s"s3n://my-week6-spark/$path")
println("DF.show() 2\n",df.collect())
}catch {
case exception: Exception => println("2 failed as ", exception)
}
try{
val df = sc.textFile(s"s3://my-week6-spark/$path")
println("DF.show() 3\n",df.collect())
}catch {
case exception: Exception => println("3 failed as ", exception)
}
}我预计文件的内容将被访问,并根据它打印一些内容。但是我得到了错误。
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities
at java.lang.ClassLoader.defineClass1(Native Method)如上所述,对于每个依赖项,它会显示不同的错误。
我在网上得到了一个解决方案,上面写着设置Hadoop路径: as: Export hadoop_path = some path。
但是因为我还没有安装Hadoop,所以我不能提供它的安装路径。
发布于 2019-06-19 21:31:17
我通过从Maven添加以下依赖项解决了这个问题:
使用相同版本的hadoop-aws、hadoop-common和hadoop-map-reduce依赖项。
scalaVersion := "2.12.8"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3"
libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "3.1.2"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "3.1.2"
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "3.1.2"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.8.7"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.8.7"
dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.12" % "2.8.7"由于内置的jackson jar不受支持,我不得不在新版本中覆盖它们,而不是仅仅添加它们。
https://stackoverflow.com/questions/56644069
复制相似问题