首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用Mesos运行dockered Spark会破坏Mesos

使用Mesos运行dockered Spark会破坏Mesos
EN

Stack Overflow用户
提问于 2019-03-12 16:08:52
回答 1查看 117关注 0票数 0

我试图使用Mesos上的Spark在DC/OS上运行Jupyter Notebook docker(Ubuntu 16.04版本)。Python输出了许多无用的错误消息,但在连接到容器并尝试从容器运行spark-submit作业后,我收到了许多关于连接问题的错误。

Spark驱动程序无法正确连接到Mesos,似乎在大多数情况下设置LIBPROCESS_IP就足够了。然而,在我的例子中,使用它完全挂起了Mesos。

这是我在docker容器中运行的:

代码语言:javascript
复制
export LIBPROCESS_ADVERTISE_IP=172.16.6.105; export SPARK_HOME=spark-2.3.2-bin-hadoop2.6; export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64; export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so; export LIBPROCESS_IP=172.19.0.4; ./spark-2.3.2-bin-hadoop2.6/bin/spark-submit --master mesos://leader.mesos:5050 --class org.apache.spark.examples.SparkPi https://downloads.mesosphere.com/spark/assets/spark-examples_2.11-2.0.1.jar 30

Spark driver挂在此部件上:

代码语言:javascript
复制
I0312 07:18:13.722151  3764 sched.cpp:232] Version: 1.2.3
I0312 07:18:13.732707  3758 sched.cpp:336] New master detected at master@172.16.6.103:5050
I0312 07:18:13.733749  3758 sched.cpp:352] No credentials provided. Attempting to register without authentication

在这一步中,Mesos挂起。根本无法访问UI,DCOS poststart检查会显示错误。

我检查了Mesos日志,下面是我看到的:

代码语言:javascript
复制
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.911664 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.911737 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.911801 32335 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [  ]
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.911841 32335 master.cpp:3048] Framework f1731b0f-a064-434f-8f15-2225a57ce2de-0014 (Spark Pi) at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534 already subscribed, resending acknowledgement
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912062 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912149 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912243 32335 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [  ]
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912281 32335 master.cpp:3048] Framework f1731b0f-a064-434f-8f15-2225a57ce2de-0014 (Spark Pi) at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534 already subscribed, resending acknowledgement
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912369 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912441 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912499 32335 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [  ]
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912534 32335 master.cpp:3048] Framework f1731b0f-a064-434f-8f15-2225a57ce2de-0014 (Spark Pi) at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534 already subscribed, resending acknowledgement
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912771 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912860 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912921 32335 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [  ]
mar 12 08:19:36 centos-master-01 mesos-master[32324]: I0312 08:19:31.912957 32335 master.cpp:3048] Framework f1731b0f-a064-434f-8f15-2225a57ce2de-0014 (Spark Pi) at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534 already subscribed, resending acknowledgement

我有时也会看到这样的情况:

代码语言:javascript
复制
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638309   855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [  ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638342   855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638381   855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638442   855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [  ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638475   855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638514   855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638572   855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [  ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638605   855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638644   855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638715   855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [  ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638751   855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638790   855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638847   855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [  ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638881   855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.638921   855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.638978   855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [  ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.639011   855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement
mar 12 08:35:13 centos-master-01 mesos-master[837]: W0312 08:35:11.639060   855 master.hpp:2322] Master attempted to send message to disconnected framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.639118   855 master.cpp:3038] Subscribing framework Spark Pi with checkpointing disabled and capabilities [  ]
mar 12 08:35:13 centos-master-01 mesos-master[837]: I0312 08:35:11.639153   855 master.cpp:3048] Framework e40238eb-4b82-4883-be2c-54103b84dea0-0009 (Spark Pi) at scheduler-0ae862ca-bf59-4f80-8d95-9d244c796547@172.16.6.105:35139 already subscribed, resending acknowledgement

这一过程一直在重复。当我停止驱动程序时,Mesos仍然是坏的,并且一直在输出这些消息:

代码语言:javascript
复制
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871507 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871595 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871671 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871744 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871811 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871911 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.871979 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.872048 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534
mar 12 08:19:49 centos-master-01 mesos-master[32324]: I0312 08:19:41.872140 32335 master.cpp:2958] Received SUBSCRIBE call for framework 'Spark Pi' at scheduler-e6d4dc88-8470-4519-967d-c86c2fee1c39@172.16.6.105:38534

所以看起来Spark Driver正在向Mesos发送垃圾邮件,订阅调用的速度太快,以至于Mesos无法跟上处理它们的步伐。尝试了Spark 2.3.2和2.4.0,结果相同。

我尝试将Spark连接到Spark Mesos Dispatcher,但是即使设置了这些LIBPROCESS变量,我也得到了常见的连接错误:

代码语言:javascript
复制
E0312 08:01:55.658208  4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.658838  4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.659353  4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.660073  4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.660650  4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.661358  4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.662775  4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.663313  4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.663964  4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected
E0312 08:01:55.664711  4874 process.cpp:2431] Failed to shutdown socket with fd 279: Transport endpoint is not connected

有没有人有这样的问题?我该怎么解决它呢?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-03-12 16:59:52

我使用docker compose在mesos上运行了spark。我已经有了一个安装了Mesos的docker镜像,并配置了mesos集群,例如,我确定了Master和workers。然后,我写了这些docker compose for master and slaves。他们毫无差错地工作。

作曲母版:

代码语言:javascript
复制
version: '3.7'
services:
  master:
   image: ubuntu_mesos_spark
   command: bash -c "sleep 40; /home/mesos-1.7.0/build/bin/mesos-master.sh --ip=150.20.11.136 --work_dir=/var/run/mesos --hostname=x.x.x.x"  ##hostname : IP of the master node
   restart: always
   network_mode: host
   environment:
    - MESOS_HOSTNAME="150.20.11.136"
    - MESOS_QUORUM=1
    - MESOS_LOG_DIR=/var/log/mesos
   expose:
    - 5050
    - 4040
    - 7077
    - 8080
   ports:
    - 5050:5050
    - 4040:4040
    - 7077:7077
    - 8080:8080

编写从属程序:

代码语言:javascript
复制
  version: '3.7'
  services: 
    slave:
      image: ubuntu_mesos_spark
      command: bash -c "sleep 40; /home/mesos-1.7.0/build/bin/mesos-slave.sh 
      --master=150.20.11.136:5050 --work_dir=/var/run/mesos 
      --systemd_enable_support=false"
      restart: always
      privileged: true
      network_mode: host
      environment:
      - MESOS_HOSTNAME="150.20.11.157"
      - MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins #also in Dockerfile
      - MESOS_LOG_DIR=/var/log/mesos
      - MESOS_LOGGING_LEVEL=INFO
      expose:
      - 5051
      ports:
      - 5051:5051

我希望这是有用的。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55116751

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档