我的EKS集群变得不健康,所有pod的错误都是"ContainerCreating“,这可能与CNI的问题有关。
一旦我启动了新的节点工作线程,它们就不会获得“就绪”状态,并提示以下错误:
"couldn't get current server API group list; will keep using cached value. (Get https://172.20.0.1:443/api?timeout=32s: dial tcp
172.20.0.1:443: i/o timeout) Failed to communicate with K8S Server. Please check instance security groups or http proxy setting"我没有使用http代理和从专用CIDR允许的安全组(从端口443 Telnet到API服务器正常工作)。
我的CNI版本是1.5.5,根据一些关于这个问题的帖子,我试图将CNI降级到1.5.3 -节点仍然没有连接,以及1.5.1 -节点已经连接,因为/etc/cni/net.d/10-aws.conflist文件存在,但pods无法连接到它们。
在版本1.5.5中,conflist文件的位置更改为/etc/cni/10-aws.conflist,但节点仍处于"NotReady“状态。
我的EKS版本是1.14,平台版本是eks.2。
Ipamd日志:
2019-11-27T09:09:13.446Z [INFO] Starting L-IPAMD v1.5.5 ...
2019-11-27T09:09:43.447Z [INFO] Testing communication with server
2019-11-27T09:10:13.448Z [INFO] Failed to communicate with K8S Server. Please check instance security groups or http proxy setting
2019-11-27T09:10:13.448Z [ERROR] Failed to create client: error communicating with apiserver: Get https://172.20.0.1:443/version?timeout=32s: dial tcp 172.20.0.1:443: i/o timeout来自容器的错误是:
Warning FailedCreatePodSandBox 17m kubelet, ip-10-1-1-144.eu-west-1.compute.internal Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "b02f175d5e68011332655e0d6e6aa3ae226bbd7bf447c7461c0140a7e026d831" network for pod "coredns-759d6fc95f-zx292": NetworkPlugin cni failed to set up pod "coredns-759d6fc95f-zx292_kube-system" network: failed to find plugin "aws-cni" in path [/opt/cni/bin], failed to clean up sandbox container "b02f175d5e68011332655e0d6e6aa3ae226bbd7bf447c7461c0140a7e026d831" network for pod "coredns-759d6fc95f-zx292": NetworkPlugin cni failed to teardown pod "coredns-759d6fc95f-zx292_kube-system" network: failed to find plugin "aws-cni" in path [/opt/cni/bin]]
Normal SandboxChanged 2m47s (x70 over 17m) kubelet, ip-10-1-1-144.eu-west-1.compute.internal Pod sandbox changed, it will be killed and re-created.CNI图片: 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon-k8s-cni:v1.5.5
/opt/cni/bin/aws-cni-support.sh脚本输出: /opt/cni/bin/aws-cni-support.sh
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 61679: Connection refused
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 61679: Connection refused
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 61679: Connection refused
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 61679: Connection refused
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 61679: Connection refused
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 61678: Connection refused
tar: Removing leading `/' from member names
/var/log/aws-routed-eni/
/var/log/aws-routed-eni/ipamd.log.2019-11-27-09
/var/log/aws-routed-eni/ipamd.log.2019-11-27-10
/var/log/aws-routed-eni/eni.out
/var/log/aws-routed-eni/pod.out
/var/log/aws-routed-eni/networkutils-env.out
/var/log/aws-routed-eni/ipamd-env.out
/var/log/aws-routed-eni/eni-configs.out
/var/log/aws-routed-eni/metrics.out
/var/log/aws-routed-eni/ifconfig.out
/var/log/aws-routed-eni/iprule.out
/var/log/aws-routed-eni/iptables-save.out
/var/log/aws-routed-eni/iptables.out
/var/log/aws-routed-eni/iptables-nat.out
/var/log/aws-routed-eni/iptables-mangle.out
/var/log/aws-routed-eni/cni/
/var/log/aws-routed-eni/cni/10-aws.conflist
/var/log/aws-routed-eni/messages
/var/log/aws-routed-eni/route.out
/var/log/aws-routed-eni/sysctls.out此外,/var/log/aws-routed-eni/messages中会出现许多以下错误: network:无法在路径/opt/cni/bin中找到插件\"aws-cni\“”
没有/opt/cni/bin/aws-cni文件。
有没有人知道这个问题是什么?
发布于 2020-07-03 22:35:08
我也遇到过同样的问题,问题出在kube-proxy上。
看,aws-cni插件实际上是由aws-node pod下载的,所以如果它们不能连接到master,就不会发生这种情况,所以配置文件和二进制文件就会丢失。对我来说修复它的是修复kube-proxy配置(由于现在不支持的标志--resource-container而导致的错误)。这可能不是您遇到的问题,但如果有任何问题,我肯定会检查kube-proxies并查看日志。这些数据不能通过kubectl logs ...获得,但存储在节点上的/var/log/kube-proxy.log中。
https://stackoverflow.com/questions/59066712
复制相似问题