我正在使用Cloudera quickstart VM来测试一些pyspark工作。对于一个任务,我需要添加spark-csv包。下面是我所做的:
PYSPARK_DRIVER_PYTHON=ipython pyspark -- packages com.databricks:spark-csv_2.10:1.3.0pyspark启动得很好,但是我确实收到了警告:
**16/02/09 17:41:22 WARN util.Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface eth0)
16/02/09 17:41:22 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/09 17:41:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable**然后我在pyspark中运行我的代码:
yelp_df = sqlCtx.load(
source="com.databricks.spark.csv",
header = 'true',
inferSchema = 'true',
path = 'file:///directory/file.csv')但我收到一条错误消息:
Py4JJavaError: An error occurred while calling o19.load.: java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv at scala.sys.package$.error(package.scala:27)可能出了什么问题??提前感谢您的帮助。
发布于 2016-02-26 01:23:45
尝尝这个
PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.3.0
如果没有空格,就会出现打字错误。
https://stackoverflow.com/questions/35261364
复制相似问题