我在三个节点上使用spark 1.2.1,这些节点运行三个具有从配置的工作人员,并通过以下方式运行日常作业:
./spark-1.2.1/sbin/start-all.sh
//crontab configuration:
./spark-1.2.1/bin/spark-submit --master spark://11.11.11.11:7077 --driver-class-path home/ubuntu/spark-cassandra-connector-java-assembly-1.2.1-FAT.jar --class "$class" "$jar"我想让火花主和从工随时可用,即使失败了,我也需要像服务一样重新启动它(就像cassandra那样)。
有什么办法吗?
编辑:
我查看了start-all.sh脚本,它只包含start-master.sh . the脚本和start-slaves.sh脚本的设置。我试图为它创建一个主管配置文件,并且只获得以下错误:
11.11.11.11: ssh: connect to host 11.11.11.12 port 22: No route to host
11.11.11.13: org.apache.spark.deploy.worker.Worker running as process 14627. Stop it first.
11.11.11.11: ssh: connect to host 11.11.11.12 port 22: No route to host
11.11.11.12: ssh: connect to host 11.11.11.13 port 22: No route to host
11.11.11.11: org.apache.spark.deploy.worker.Worker running as process 14627. Stop it first.
11.11.11.12: ssh: connect to host 11.11.11.12 port 22: No route to host
11.11.11.13: ssh: connect to host 11.11.11.13 port 22: No route to host
11.11.11.11: org.apache.spark.deploy.worker.Worker running as process 14627. Stop it first.发布于 2016-03-15 11:27:42
有些工具,如monit和supervisor (甚至systemd),可以监视和重新启动失败的进程。
https://stackoverflow.com/questions/36009436
复制相似问题