我有一个作业对象,它将使用节点选择器来只使用节点,节点下有一个GPU。我知道如何设置它(它是从python程序中的字符串转换而来的)。
job = f"""
apiVersion: batch/v1
kind: Job
....
nodeSelector:
sma-gpu-size: {gpu_size}
"""我们的ops团队在接下来的几周内设置这个选择器,但是目前在设置节点选择器时,服务无法启动。
2022-09-20T07:20:24Z [Warning] 0/35 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 30 node(s) didn't match Pod's node affinity/selector.如果这些node_selectors是可用的,那么是否有可能使用它们(伪yaml)?
job = f"""
apiVersion: batch/v1
kind: Job
....
nodeSelector:
if_available:
sma-gpu-size: {gpu_size}
else:
Any
"""发布于 2022-09-20 11:58:51
它不是,但您可以用一个nodeSelector来代替nodeAffinity来实现这一点。
spec:
[...]
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: sma-gpu-size
operator: In
values:
- {gpu_size}来自文档
preferredDuringSchedulingIgnoredDuringExecution:调度程序试图找到一个符合规则的节点。如果一个匹配的节点不可用,调度程序仍然会调度Pod。
添加标签后,可以切换到requiredDuringSchedulingIgnoredDuringExecution
requiredDuringSchedulingIgnoredDuringExecution:调度程序不能调度Pod,除非满足该规则。它的功能类似于nodeSelector,但语法更有表现力。
或者回到nodeSelector。
https://stackoverflow.com/questions/73786115
复制相似问题