在此CDP7中,我将数据存储到CDP7中的pyspark的hbase表中,在此示例之后,使用的组件如下:
我使用的命令:
spark3-submit --packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml test-hbase3.py然而,我得到了这个错误,我需要把它放在hastebin.com中很长时间,如下所示:火花测井
错误片段:
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/test-hbase3.py", line 45, in <module>
main()
File "/opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/test-hbase3.py", line 24, in main
writeDF.write.options(catalog=writeCatalog, newtable=5).format(dataSourceFormat).save()
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1107, in save
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o36.save.
: java.lang.NoClassDefFoundError: scala/Product$class
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:73)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:59)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)我该怎么做才能纠正这个错误?我试着找其他连接器。然而,只有SHC连接器。我这里没有使用任何Maven回购。但是,不确定是否有缺少依赖项或其他错误。
发布于 2021-08-02 18:14:40
这是scala版本的冲突。您的shc_core.jar是为Scala2.11编译的,但是您使用的是Scala2.12,它与2.11不兼容。
最简单的解决方法是从Scala2.12的源代码重新编译shc_core (尽管最终可能会出现兼容性问题,因为该项目显然没有用Scala2.12进行测试)
您可以探索解决问题的其他方法:
https://stackoverflow.com/questions/68624355
复制相似问题