我在kubernetes集群中有一个elasticsearch集群。我将数据荚转到内存优化的节点,这些节点被污染,因此只有elasticsearch数据荚才会被调度到。现在,我为这些数据荚提供了3个内存优化的ec2实例。它们是r5.2XLarge,内存为64G。下面是其中一个r5节点的输出。(他们看起来都一样)
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 8
ephemeral-storage: 32461564Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 65049812Ki
pods: 110
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 8
ephemeral-storage: 29916577333
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 64947412Ki
pods: 110
System Info:
Machine ID: ec223b5ea23ea6bd5b06e8ed0a733d2d
System UUID: ec223b5e-a23e-a6bd-5b06-e8ed0a733d2d
Boot ID: 798aca5f-d9e1-4c9f-b75d-e16f7ba2d514
Kernel Version: 5.4.0-1024-aws
OS Image: Ubuntu 20.04.1 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.11
Kubelet Version: v1.18.10
Kube-Proxy Version: v1.18.10
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
amazon-cloudwatch fluentd-cloudwatch-tzsv4 100m (1%) 0 (0%) 200Mi (0%) 400Mi (0%) 21d
default prometheus-prometheus-node-exporter-tvmd4 100m (1%) 0 (0%) 0 (0%) 0 (0%) 21d
es elasticsearch-data-0 500m (6%) 1 (12%) 8Gi (12%) 8Gi (12%) 14m
kube-system calico-node-dhxg5 100m (1%) 0 (0%) 0 (0%) 0 (0%) 21d
kube-system kube-proxy-ip-10-1-12-115.us-gov-west-1.compute.internal 100m (1%) 0 (0%) 0 (0%) 0 (0%) 21d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 900m (11%) 1 (12%)
memory 8392Mi (13%) 8592Mi (13%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0下面是我的集群的样子
kubectl get pods -n es
NAME READY STATUS RESTARTS AGE
elasticsearch-client-0 1/1 Running 0 77m
elasticsearch-client-1 1/1 Running 0 77m
elasticsearch-data-0 1/1 Running 0 77m
elasticsearch-data-1 1/1 Running 0 77m
elasticsearch-data-2 1/1 Running 0 77m
elasticsearch-data-3 0/1 Pending 0 77m
elasticsearch-data-4 0/1 Pending 0 77m
elasticsearch-data-5 0/1 Pending 0 77m
elasticsearch-data-6 0/1 Pending 0 77m
elasticsearch-data-7 0/1 Pending 0 77m
elasticsearch-master-0 2/2 Running 0 77m
elasticsearch-master-1 2/2 Running 0 77m
prometheus-elasticsearch-exporter-6d6c5d49cf-4w7gc 1/1 Running 0 22h下面是我描述荚的事件
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 56s (x5 over 3m35s) default-scheduler 0/11 nodes are available: 3 Insufficient memory, 3 node(s) didn't match pod affinity/anti-affinity, 3 node(s) didn't satisfy existing pods anti-affinity rules, 5 node(s) didn't match node selector.以下是我对数据荚的资源限制和请求
Limits:
cpu: 1
memory: 8Gi
Requests:
cpu: 500m
memory: 8Gi以下是我的nodeAffinity的样子
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: es-data
operator: In
values:
- "true"而我的宽容
tolerations:
- key: "es-data"
operator: "Equal"
value: "true"
effect: "NoSchedule"下面是我描述节点时的节点容忍度
Taints: es-data=true:NoSchedule我把它玷污了
kubectl taint nodes <node> es-data=true:NoSchedule根据我的计算,基于我的理解(这可能是错误的),我的数据荚只要求从一个有64G可用的节点中获得8G内存,并且只有一个请求8G内存的荚已经在使用它。所以理论上它应该有56G的内存留给其他的荚,要求被安排在它上。而即使是使用的内存显示,它只使用了13%。为什么不能安排时间?如何排除故障?我是否误解了这件事的运作方式?我还能告诉你什么能帮你解决这个问题呢?
解决方案:根据Hakob的评论,问题是我设置了nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution集,这是一个很难的要求,要求调度程序只对每个节点调度一个。为了将多个节点调度到每个节点,我需要做的是将其更改为nodeSelector。如果您要达到这个目标,请注意Hakob的建议,为什么不建议这样做的最佳实践。我同意这个建议。虽然在我的例子中,这是一个来自客户端的要求,即使在讨论了为什么他们不应该这样做之后也没有选择。因此,请在应用此更改时考虑到这一点。
发布于 2021-03-06 19:19:01
3内存不足,3个节点(S)与荚亲和力/反亲和力不匹配,3个节点(S)不满足现有的荚抗亲和力规则。
这意味着ES试图找到一个不同的节点来分别部署ES的所有豆荚。但是,节点计数的原因不足以在每个节点上运行一个荚,其他的荚仍然处于挂起状态。
欲了解更多信息,请阅读这里
因此,从这里开始,你有两个选择-- Neo))
https://devops.stackexchange.com/questions/13488
复制相似问题