我是spark/scala开发的新手。我使用maven来构建我的项目,集成开发环境是intelliJ。我正在尝试查询一个hive表,然后迭代结果数据帧(使用foreach)。下面是我的代码:
try{
val DF_1 = hiveContext.sql("select distinct(address) from
test_table where trim(address)!=''")
println("number of rows: "+DF_1.count)
DF_1.foreach(x => {
val y =hiveContext.sql("select place from test_table where address='"+x(0).toString+"'")
if(y.count > 1){
println("Multiple place values for address: "+x(0).toString)
y.foreach(r => println(r))
println("*************")
}
})}
catch {case e: Exception => e.printStackTrace()}在每次迭代中,我都会查询同一个表以获得另一个列,尝试查看test_table中的每个address是否有多个places值。我没有编译错误,应用程序构建成功。但是,当我运行上面的代码时,我得到了以下错误:
java.lang.NoClassDefFoundError: Could not initialize class xxxxxxxx应用程序成功启动,打印DF_1中行的count,然后在foreach循环中失败并显示上述错误。我在jar上做了一个jar xvf,可以看到主类- driver.class
com/.../driver$$anonfun$1$$anonfun$apply$1.class
com/.../driver$$anonfun$1.class
com/.../driver$$anonfun$2.class
com/.../driver$$anonfun$3.class
com/.../driver$$anonfun$4.class
com/.../driver$$anonfun$5.class
com/.../driver$$anonfun$main$1$$anonfun$apply$1.class
com/.../driver$$anonfun$main$1$$anonfun$apply$2.class
com/.../driver$$anonfun$main$1$$anonfun$apply$3.class
com/.../driver$$anonfun$main$1.class
com/.../driver$$anonfun$main$10$$anonfun$apply$9.class
com/.../driver$$anonfun$main$10.class
com/.../driver$$anonfun$main$11.class
com/.../driver$$anonfun$main$12.class
com/.../driver$$anonfun$main$13.class
com/.../driver$$anonfun$main$14.class
com/.../driver$$anonfun$main$15.class
com/.../driver$$anonfun$main$16.class
com/.../driver$$anonfun$main$17.class
com/.../driver$$anonfun$main$18.class
com/.../driver$$anonfun$main$19.class
com/.../driver$$anonfun$main$2$$anonfun$apply$4.class
com/.../driver$$anonfun$main$2$$anonfun$apply$5.class
com/.../driver$$anonfun$main$2$$anonfun$apply$6.class
com/.../driver$$anonfun$main$2.class
com/.../driver$$anonfun$main$20.class
com/.../driver$$anonfun$main$21.class
com/.../driver$$anonfun$main$22.class
com/.../driver$$anonfun$main$23.class
com/.../driver$$anonfun$main$3$$anonfun$apply$7.class
com/.../driver$$anonfun$main$3$$anonfun$apply$8.class
com/.../driver$$anonfun$main$3.class
com/.../driver$$anonfun$main$4$$anonfun$apply$9.class
com/.../driver$$anonfun$main$4.class
com/.../driver$$anonfun$main$5.class
com/.../driver$$anonfun$main$6$$anonfun$apply$1.class
com/.../driver$$anonfun$main$6$$anonfun$apply$2.class
com/.../driver$$anonfun$main$6$$anonfun$apply$3.class
com/.../driver$$anonfun$main$6$$anonfun$apply$4.class
com/.../driver$$anonfun$main$6$$anonfun$apply$5.class
com/.../driver$$anonfun$main$6.class
com/.../driver$$anonfun$main$7$$anonfun$apply$1.class
com/.../driver$$anonfun$main$7$$anonfun$apply$2.class
com/.../driver$$anonfun$main$7$$anonfun$apply$3.class
com/.../driver$$anonfun$main$7$$anonfun$apply$4.class
com/.../driver$$anonfun$main$7$$anonfun$apply$5.class
com/.../driver$$anonfun$main$7$$anonfun$apply$6.class
com/.../driver$$anonfun$main$7$$anonfun$apply$7.class
com/.../driver$$anonfun$main$7$$anonfun$apply$8.class
com/.../driver$$anonfun$main$7.class
com/.../driver$$anonfun$main$8$$anonfun$apply$10.class
com/.../driver$$anonfun$main$8$$anonfun$apply$4.class
com/.../driver$$anonfun$main$8$$anonfun$apply$5.class
com/.../driver$$anonfun$main$8$$anonfun$apply$6.class
com/.../driver$$anonfun$main$8$$anonfun$apply$7.class
com/.../driver$$anonfun$main$8$$anonfun$apply$8.class
com/.../driver$$anonfun$main$8$$anonfun$apply$9.class
com/.../driver$$anonfun$main$8.class
com/.../driver$$anonfun$main$9$$anonfun$apply$11.class
com/.../driver$$anonfun$main$9$$anonfun$apply$7.class
com/.../driver$$anonfun$main$9$$anonfun$apply$8.class
com/.../driver$$anonfun$main$9$$anonfun$apply$9.class
com/.../driver$$anonfun$main$9.class
com/.../driver$.class
com/.../driver.class当我在local模式而不是yarn模式下启动作业时,我不会遇到该错误。是什么导致了这个问题?如何纠正它?
任何帮助都会很感谢,谢谢。
发布于 2017-11-03 20:58:16
看起来您的jar或某些依赖项没有分布在工作节点之间。在local模式下,它可以工作,因为您可以将jars放在适当的位置。在yarn模式下,您需要构建一个包含所有依赖项的fat-jar,其中包括hive和spark库。
https://stackoverflow.com/questions/47064520
复制相似问题