我在Oracle linux上构建了独立的spark集群。我在Master上的火花-env.sh中添加了这一行:
export SPARK_MASTER_HOST=x.x.x.x并在主和工作器中的spark-env.sh中添加以下行:
export PYSPARK_PYTHON=/usr/bin/python3.8
export PYSPARK_DRIVER_PYTHON=/usr/bin/python3.8另外,我还在工作者文件中为主和工作者插入了工作者的IP。我以这样的方式启动星团:在主人:
/opt/spark/sbin/start-master.sh在工人中:
/opt/spark/sbin/start-worker.sh spark://x.x.x.x:7077事实上,我有一个工人和一个主人。我将~/..bashrc配置为:
export JAVA_HOME=/opt/oracle/java/jdk1.8.0_25
export PATH=$JAVA_HOME/bin:$PATH
alias python=/usr/bin/python3.8
export LD_LIBRARY_PATH=/opt/oracle/instantclient_21_4:$LD_LIBRARY_PATH
export PATH=/opt/oracle/instantclient_21_4:$PATH
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PYSPARK_HOME=/usr/bin/python3.8
export PYSPARK_DRIVER_PYTHON=python3.8
export PYSPARK_PYTHON=/usr/bin/python3.8 当我运行spark-submit时,我没有错误,但是命令永远运行,没有任何结果。我看到这句话:
22/03/04 12:07:40 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks resource profile 0
22/03/04 12:07:41 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20220304120738-0000/0 is now EXITED (Command exited with code 1)
22/03/04 12:07:41 INFO StandaloneSchedulerBackend: Executor app-20220304120738-0000/0 removed: Command exited with code 1
22/03/04 12:07:41 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20220304120738-0000/3 on worker-20220304120443-192.9.200.68-42185 (192.9.200.68:42185) with 2 core(s)
22/03/04 12:07:41 INFO StandaloneSchedulerBackend: Granted executor ID app-20220304120738-0000/3 on hostPort 192.9.200.68:42185 with 2 core(s), 2.0 GiB RAM我检查worker日志,我有以下错误:
22/03/04 12:07:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with m$
22/03/04 12:07:38 INFO ExecutorRunner: Launch command: "/opt/oracle/java/jdk1.8.0_25/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx2048M" "-Dspark.driver.port=40345" "-XX:+PrintGC$
22/03/04 12:07:38 INFO ExecutorRunner: Launch command: "/opt/oracle/java/jdk1.8.0_25/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx2048M" "-Dspark.driver.port=40345" "-XX:+PrintGC$
22/03/04 12:07:38 INFO ExecutorRunner: Launch command: "/opt/oracle/java/jdk1.8.0_25/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx2048M" "-Dspark.driver.port=40345" "-XX:+PrintGC$
22/03/04 12:07:41 INFO Worker: Executor app-20220304120738-0000/0 finished with state EXITED message Command exited with code 1 exitStatus 1
22/03/04 12:07:41 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 0
22/03/04 12:07:41 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20220304120738-0000, execId=0)spark-submit是这样的:
/opt/spark/bin/spark-submit --master spark://x.x.x.x:7077 --files etl/sparkConfig.json --py-files etl/brn_utils.py,etl/cst.py,etl/cst_utils.py,etl/emp_utils.py,etl/general_utils.py,etl/grouping.py,etl/grp_state.py,etl/conn.py etl/main.py我在根用户中进行测试,也创建了火花用户,没有任何改变。
你能指点我出什么事了吗?
谢谢。
发布于 2022-03-04 10:27:32
问题解决了。
我想是因为网络问题。自从我把这个部分添加到火花提交之后,每件事情都很好。
--conf spark.driver.host=x.x.x.x事实上,我做了这样的操作:
/opt/spark/bin/spark-submit --master spark://x.x.x.x:7077 --conf spark.driver.host=x.x.x.x --files etl/sparkConfig.json --py-files etl/brn_utils.py,etl/cst.py,etl/cst_utils.py,etl/emp_utils.py,etl/general_utils.py,etl/grouping.py,etl/grp_state.py,etl/conn.py etl/main.py小心在同一地点的所有节点上复制您的程序。另外,由于我远程访问集群,所以我使用SSH隧道在我的计算机中具有UI。如下所示:
ssh spark@master_ip -N -L 4040:master_ip:8080在上面的命令中,4040是我计算机的端口,8080是主控端口。在创建了Master_IP:8080隧道之后,我可以通过在浏览器中编写来打开spark。
希望能帮上忙。
https://stackoverflow.com/questions/71349431
复制相似问题