TL;DR: NodeSelector忽略来自另一个NodePool的节点。如何使用标签NodePools或其他技术在多个nodeSelector之间分发豆荚?
我有两个这样的nodePools:
...
# Spot node pool
resource "azurerm_kubernetes_cluster_node_pool" "aks_staging_np_compute_spot" {
name = "computespot"
(...)
vm_size = "Standard_F8s_v2"
max_count = 2
min_count = 2
(...)
priority = "Spot"
eviction_policy = "Delete"
(...)
node_labels = {
"pool_type" = "compute"
}
# Regular node pool
resource "azurerm_kubernetes_cluster_node_pool" "aks_staging_np_compute_base" {
name = "computebase"
(...)
vm_size = "Standard_F8s_v2"
max_count = 2
min_count = 2
node_labels = {
"pool_type" = "compute"
}两个池都部署在AKS中,所有节点都处于OK状态。请注意两件事:
pool_type: compute
Standard_F8s_v2的标签
(我的集群中还有另外20个具有不同标签的节点,这些节点并不重要。)
然后,我得到了这样的部署(为了简洁起见,省略了不相关的行):
apiVersion: apps/v1
kind: Deployment
metadata:
(...)
spec:
replicas: 4
selector:
matchLabels:
app: myapp
template:
(...)
spec:
nodeSelector:
pool_type: compute
(...)
containers:
(...)在tolerations中也有接受Azure实例的条目。很明显很管用。
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"问题在于,应用程序只部署在一个节点池(本例中为** (**computebase**). "computespot"**),而从不使用与另一个节点连接。即使标签和单个节点的大小相同,也是如此。
computespot节点上运行,每个节点只运行一个节点。0/24 nodes are available: 14 Insufficient cpu, 17 Insufficient memory, 4 node(s) didn't match node selector.调度,这是一个绝对的谎言,因为我可以看到computebase节点完全是空的。如何才能解决这个问题?
发布于 2021-10-19 16:34:04
找到了一种利用荚亲和力的解决方案。
spec:
# This didn't work:
#
# nodeSelector:
# pool_type: compute
#
# But this does:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: pool_type
operator: In
values:
- compute 我不知道原因,因为我们还在处理一个单一的标签。如果有人知道,请分享。
https://stackoverflow.com/questions/69631881
复制相似问题