我正在尝试提交一个火花作业使用'gcloud dataproc作业提交火花‘。要连接到ES集群,我需要传递信任存储路径。
如果我将信任存储文件复制到所有工作节点并给出以下绝对路径,则作业是成功的:
esSparkConf.put("es.net.ssl.truststore.location","file:///tmp/trust.jks");但我不想这样做。如果工作节点更多地复制到每个节点是困难的。
我试图使用--文件选项传递信任库文件,如下所示:
gcloud dataproc jobs submit spark --cluster=sprk-prd1 --region=<> --files=trust.jks --class=ESDumpJob --jars=gs://randome/jars/ESDump-jar-with-dependencies.jarESDumpJob中的代码片段:
SparkConf sparkConf = new SparkConf(true).setAppName("My ES job");
sparkConf.set("spark.es.nodes.wan.only","true")
.set("spark.es.nodes", <es_nodes>)
.set("spark.es.net.ssl","true")
.set("spark.es.net.ssl.truststore.location","trust.jks"))
.set("spark.es.net.ssl.truststore.pass", "pass"))
.set("spark.es.net.http.auth.user","test")
.set("spark.es.net.http.auth.pass", "test"));
sparkSession = SparkSession
.builder().master("local")
.config(sparkConf)
.config("spark.scheduler.mode", "FAIR")
.getOrCreate();
JavaRDD<MyData> data = //create rdd
JavaEsSpark.saveToEs(data, "my_index", ImmutableMap.of("es.mapping.id", "id"));在这种情况下,我的错误低于错误
17:15:42 Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Expected to find keystore file at [trust.jks] but was unable to. Make sure that it is available on the classpath, or if not, that you have specified a valid URI.
17:15:42 at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.loadKeyStore(SSLSocketFactory.java:195)
17:15:42 at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.loadTrustManagers(SSLSocketFactory.java:226)
17:15:42 at org.elasticsearch.hadoop.rest.commonshttp.SSLSocketFactory.createSSLContext(SSLSocketFactory.java:173)发布于 2022-04-28 23:45:49
您需要使用org.apache.spark.SparkFiles.get(fileName)获取实际路径,并添加file://前缀。
sparkConf.set(
"spark.es.net.ssl.truststore.location",
"file://" + org.apache.spark.SparkFiles.get("trust.jks"))参见SparkFiles.get和这个问题。
https://stackoverflow.com/questions/72043171
复制相似问题