我的Kubeflow流水线组件/作业继续无限期运行,即使主执行已经完成。从这些日志中,人们是否会看到作业无法成功完成的原因?
似乎还有一个等待容器在继续运行,即使主容器已经成功完成。
任何洞察力都是非常感谢的。
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned default/secondary-market-pipeline-6plbl-940127540 to gke-cluster-1-pool-1-46a6353b-wfpg
Normal Pulled 10m kubelet Container image "gcr.io/cloud-marketplace/google-cloud-ai-platform/kubeflow-pipelines/argoexecutor:1.7.1" already present on machine
Normal Created 10m kubelet Created container wait
Normal Started 10m kubelet Started container wait
Normal Pulling 10m kubelet Pulling image "<image>:latest"
Normal Pulled 10m kubelet Successfully pulled image "<image>:latest" in 1.617667035s
Normal Created 10m kubelet Created container main
Normal Started 10m kubelet Started container main发布于 2021-11-15 16:47:53
我的解决方案是在这个线程https://github.com/kubeflow/pipelines/issues/6793中找到的
我的highmem节点不是“为Docker优化的容器”,需要这样做。创建一个新的节点池。
https://stackoverflow.com/questions/69974007
复制相似问题