首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >distributed.scheduler.KilledWorker异常的根本原因是什么?

distributed.scheduler.KilledWorker异常的根本原因是什么?
EN

Stack Overflow用户
提问于 2019-07-10 14:14:09
回答 1查看 775关注 0票数 0

我正试着在一个纱线集群上做一件达斯克的工作。这个作业使用hdfs3库读写HDFS。

  • 当我在没有Kerberos安全层的集群上运行它时,它运行得很好。
  • 但是,在带有Kerberos安全层的集群上,我必须实现解决方案这里,以避免与Kerberos相关的错误。运行相同的作业会导致以下错误:
代码语言:javascript
复制
  File "/fsstreamdevl/f6_development/acoustics/acoustics_analysis_dask/acoustics_analytics/task_runner/task_runner.py", line 123, in run
    dask.compute(tasks)
  File "/anaconda_env/projects/f6acoustics/dev/dask_yarn_test/lib/python3.7/site-packages/dask/base.py", line 446, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/anaconda_env/projects/f6acoustics/dev/dask_yarn_test/lib/python3.7/site-packages/distributed/client.py", line 2568, in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "/anaconda_env/projects/f6acoustics/dev/dask_yarn_test/lib/python3.7/site-packages/distributed/client.py", line 1822, in gather
    asynchronous=asynchronous,
  File "/anaconda_env/projects/f6acoustics/dev/dask_yarn_test/lib/python3.7/site-packages/distributed/client.py", line 753, in sync
    return sync(self.loop, func, *args, **kwargs)
  File "/anaconda_env/projects/f6acoustics/dev/dask_yarn_test/lib/python3.7/site-packages/distributed/utils.py", line 331, in sync
    six.reraise(*error[0])
  File "/anaconda_env/projects/f6acoustics/dev/dask_yarn_test/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/anaconda_env/projects/f6acoustics/dev/dask_yarn_test/lib/python3.7/site-packages/distributed/utils.py", line 316, in f
    result[0] = yield future
  File "/anaconda_env/projects/f6acoustics/dev/dask_yarn_test/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/anaconda_env/projects/f6acoustics/dev/dask_yarn_test/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/anaconda_env/projects/f6acoustics/dev/dask_yarn_test/lib/python3.7/site-packages/distributed/client.py", line 1653, in _gather
    six.reraise(type(exception), exception, traceback)
  File "/anaconda_env/projects/f6acoustics/dev/dask_yarn_test/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
distributed.scheduler.KilledWorker: ('__call__-6af7aa29-2a09-45f3-a5e2-207c06562672', <Worker 'tcp://10.194.211.132:11927', memory: 0, processing: 1>)
  • 奇怪的是,如果在前一个集群上运行相同的解决方案,而不使用Kerberos安全层,那么我就会得到相同的错误。

查看纱线应用程序日志,我看到了下面的跟踪,但无法说明它的含义。

代码语言:javascript
复制
distributed.nanny - INFO - Closing Nanny at 'tcp://10.194.211.133:17659'
Traceback (most recent call last):
  File "/opt/hadoop/data/05/hadoop/yarn/local/usercache/hdfsf6/appcache/application_1560931326013_171773/container_e47_1560931326013_171773_01_000003/environment/lib/python3.7/multiprocessing/queues.py", line 242, in _feed
    send_bytes(obj)
  File "/opt/hadoop/data/05/hadoop/yarn/local/usercache/hdfsf6/appcache/application_1560931326013_171773/container_e47_1560931326013_171773_01_000003/environment/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/opt/hadoop/data/05/hadoop/yarn/local/usercache/hdfsf6/appcache/application_1560931326013_171773/container_e47_1560931326013_171773_01_000003/environment/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/opt/hadoop/data/05/hadoop/yarn/local/usercache/hdfsf6/appcache/application_1560931326013_171773/container_e47_1560931326013_171773_01_000003/environment/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

End of LogType:dask.worker.log

我没有在日志中看到任何关于低内存的显式消息。有人知道如何诊断这个问题吗?

EN

回答 1

Stack Overflow用户

发布于 2019-07-10 14:27:20

hdfs3不再被积极维护。与HDFS交互有两个主要选择:

  • pyarrow的hdfs驱动程序 (通过libhdfs jni库),它要求您正确设置java和hadoop需求,并使调用它的会话可用。
  • webhdfs (比如在fsspec中)不需要java库,如果系统上允许HTTP身份验证,它可以与kerberos交互。
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/56972651

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档