我尝试使用以下命令运行python spark-shell:
bin/pyspark --packages datastax:spark-cassandra-connector:1.5.0-RC1-s_2.11,org.apache.spark:spark-streaming-kafka_2.10:1.6.0以下命令的输出显示它能够找到spark-cassandra-connector包:
resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found datastax#spark-cassandra-connector;1.5.0-RC1-s_2.11 in spark-packages
found org.apache.cassandra#cassandra-clientutil;2.2.2 in central
found com.datastax.cassandra#cassandra-driver-core;3.0.0-rc1 in central
found io.netty#netty-handler;4.0.33.Final in central
found io.netty#netty-buffer;4.0.33.Final in central
found io.netty#netty-common;4.0.33.Final in central但是,当我尝试使用以下任何命令导入程序包时,我得到import Error:
from com.datastax import *
from com.datastax.spark.connector import *输出:
ImportError: No module named com.datastax
ImportError: No module named com.datastax.spark.connector有没有人能建议一下这里出了什么问题?
发布于 2016-02-14 21:45:45
据我所知,Cassandra Connector没有一行Python代码,更不用说命名奇怪的Python模块了。Python互操作性是通过数据源API实现的,无需任何额外的导入即可使用。
sqlContext.read.format("org.apache.spark.sql.cassandra").options(...).load(...)即使它这样做了,--packages也仅用于分发JVM依赖项。外部依赖项(Python,R)必须以依赖方式分发或安装,例如使用PyFiles。
https://stackoverflow.com/questions/35390396
复制相似问题