首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >dask-kubernetes KubeCluster卡住

dask-kubernetes KubeCluster卡住
EN

Stack Overflow用户
提问于 2021-08-17 18:57:57
回答 1查看 141关注 0票数 0

我正在尝试在kubernetes上运行dask。下面是dask-kubernetes的hello world,但我被下面的错误卡住了。

main.py:

代码语言:javascript
复制
import os
from dask_kubernetes import KubeCluster
from dask.distributed import Client
import dask.array as da


if __name__ == '__main__':
    path_to_src = os.path.dirname(os.path.abspath(__file__))
    cluster = KubeCluster(os.path.join(path_to_src, 'pod-spec.yaml'), namespace='124381-dev')
    print('Cluster constructed')

    cluster.scale(10)
    # print('Cluster scaled')

    # Connect Dask to the cluster
    client = Client(cluster)
    print('Client constructed')

    # Create a large array and calculate the mean
    array = da.ones((100, 100, 100))
    print('Created big array')
    print(array.mean().compute())  # Should print 1.0
    print('Computed mean')

输出:

代码语言:javascript
复制
$ python src/main.py 
Creating scheduler pod on cluster. This may take some time.
Forwarding from 127.0.0.1:60611 -> 8786
Handling connection for 60611
Handling connection for 60611
Handling connection for 60611
Cluster constructed
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f1f874b8130>>, <Task finished name='Task-54' coro=<SpecCluster._correct_state_internal() done, defined at /home/cliff/anaconda3/envs/dask/lib/python3.8/site-packages/distributed/deploy/spec.py:327> exception=TypeError("unsupported operand type(s) for +=: 'NoneType' and 'list'")>)
Traceback (most recent call last):
  File "/home/cliff/anaconda3/envs/dask/lib/python3.8/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/home/cliff/anaconda3/envs/dask/lib/python3.8/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/home/cliff/anaconda3/envs/dask/lib/python3.8/site-packages/distributed/deploy/spec.py", line 358, in _correct_state_internal
    worker = cls(self.scheduler.address, **opts)
  File "/home/cliff/anaconda3/envs/dask/lib/python3.8/site-packages/dask_kubernetes/core.py", line 151, in __init__
    self.pod_template.spec.containers[0].args += worker_name_args
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'list'
Handling connection for 60611
Handling connection for 60611
Client constructed
Created big array

请注意,在输出的末尾没有终端提示符-它仍在运行,但永远不会运行。在另一个终端中,kubectl get pods也显示“cliff- well”正在运行。

pod-spec.yaml:

代码语言:javascript
复制
apiVersion: v1
kind: Pod
metadata:
  name: cliff-testing
  labels:
    app: cliff-docker-test
spec:
  imagePullSecrets:
  - name: <redacted>
  securityContext:
    runAsUser: 1000
  restartPolicy: OnFailure
  containers:
  - name: cliff-test-container
    image: <redacted: works with docker pull>
    imagePullPolicy: Always
    resources:
      limits:
        cpu: 2
        memory: 4G
      requests:
        cpu: 1
        memory: 2G
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-08-24 20:33:30

pod模板(pod-spec.yaml)中设置了字段metadata.name。删除它将允许代码运行。看起来,dask-kubernetes创建了一个名为"dask--“的调度程序pod,并遵循与工作进程相同的命名方法。通过修复pod模板中的名称,dask-kubernetes试图创建与计划程序pod (以及彼此)同名的worker pod,这是非法的。

如果您想以不同的方式命名pod,您可以在构造KubeCluster时使用关键字参数dask来命名pod (dask会自动在每个pod的名称中附加一个随机字符串)。

例如,下面的示例将每个pod (调度器和工作进程)命名为"my-dask-pods-“

代码语言:javascript
复制
from dask_kubernetes import KubeCluster
cluster = KubeCluster('pod-spec.yaml', name='my-dask-pods-')
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68822702

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档