我的代码是:
APP_NAME = "mysql_query"
if __name__ == "__main__":
conf = SparkConf().setAppName(APP_NAME)
conf = conf.setMaster("local[*]")
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
hostname = "hostname"
dbname = "database_name"
jdbcPort = 3306
username = "username"
password = "password"
jdbc_url = "jdbc:mysql://{}:{}/{}?user={}&password={}".format(hostname, jdbcPort, dbname, username, password)
query = "(SELECT * XXXXXXX_XXXX_XXX_XX) t1_alias"
df = sqlContext.read.format('jdbc').options(driver='com.mysql.jdbc.Driver', url=jdbc_url, dbtable=query).load()此代码当前位于S3存储桶上。我已经将SSH-编辑到了EMR主节点,每次我使用spark-submit --master yarn --deploy-mode cluster mysql_spark.py提交代码时,我都会得到错误- java.lang.ClassNotFoundException: com.mysql.jdbc.Driver。
我已经安装了所需的jdbc驱动程序。这里有什么问题?帮助!
发布于 2020-06-22 08:49:18
试着在下面-
spark-submit --master yarn \
--deploy-mode cluster \
--jars mysql-connector-java-8.0.19.jar \
--driver-class-path mysql-connector-java-8.0.19.jar \
--conf spark.executor.extraClassPath=mysql-connector-java-8.0.19.jar \
mysql_spark.py参考文献this answer
https://stackoverflow.com/questions/62510233
复制相似问题