首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >EKS集群API与节点的连通性问题

EKS集群API与节点的连通性问题
EN

Stack Overflow用户
提问于 2019-11-27 17:15:12
回答 1查看 2.1K关注 0票数 0

我的EKS集群变得不健康,所有pod的错误都是"ContainerCreating“,这可能与CNI的问题有关。

一旦我启动了新的节点工作线程,它们就不会获得“就绪”状态,并提示以下错误:

代码语言:javascript
复制
"couldn't get current server API group list; will keep using cached value. (Get https://172.20.0.1:443/api?timeout=32s: dial tcp
172.20.0.1:443: i/o timeout) Failed to communicate with K8S Server. Please check instance security groups or http proxy setting"

我没有使用http代理和从专用CIDR允许的安全组(从端口443 Telnet到API服务器正常工作)。

我的CNI版本是1.5.5,根据一些关于这个问题的帖子,我试图将CNI降级到1.5.3 -节点仍然没有连接,以及1.5.1 -节点已经连接,因为/etc/cni/net.d/10-aws.conflist文件存在,但pods无法连接到它们。

在版本1.5.5中,conflist文件的位置更改为/etc/cni/10-aws.conflist,但节点仍处于"NotReady“状态。

我的EKS版本是1.14,平台版本是eks.2。

Ipamd日志:

代码语言:javascript
复制
2019-11-27T09:09:13.446Z [INFO] Starting L-IPAMD v1.5.5  ...
2019-11-27T09:09:43.447Z [INFO] Testing communication with server
2019-11-27T09:10:13.448Z [INFO] Failed to communicate with K8S Server. Please check instance security groups or http proxy setting
2019-11-27T09:10:13.448Z [ERROR]        Failed to create client: error communicating with apiserver: Get https://172.20.0.1:443/version?timeout=32s: dial tcp 172.20.0.1:443: i/o timeout

来自容器的错误是:

代码语言:javascript
复制
Warning  FailedCreatePodSandBox  17m                   kubelet, ip-10-1-1-144.eu-west-1.compute.internal  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "b02f175d5e68011332655e0d6e6aa3ae226bbd7bf447c7461c0140a7e026d831" network for pod "coredns-759d6fc95f-zx292": NetworkPlugin cni failed to set up pod "coredns-759d6fc95f-zx292_kube-system" network: failed to find plugin "aws-cni" in path [/opt/cni/bin], failed to clean up sandbox container "b02f175d5e68011332655e0d6e6aa3ae226bbd7bf447c7461c0140a7e026d831" network for pod "coredns-759d6fc95f-zx292": NetworkPlugin cni failed to teardown pod "coredns-759d6fc95f-zx292_kube-system" network: failed to find plugin "aws-cni" in path [/opt/cni/bin]]
  Normal   SandboxChanged          2m47s (x70 over 17m)  kubelet, ip-10-1-1-144.eu-west-1.compute.internal  Pod sandbox changed, it will be killed and re-created.

CNI图片: 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon-k8s-cni:v1.5.5

/opt/cni/bin/aws-cni-support.sh脚本输出: /opt/cni/bin/aws-cni-support.sh

代码语言:javascript
复制
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 61679: Connection refused
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 61679: Connection refused
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 61679: Connection refused
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 61679: Connection refused
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 61679: Connection refused
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 61678: Connection refused
tar: Removing leading `/' from member names
/var/log/aws-routed-eni/
/var/log/aws-routed-eni/ipamd.log.2019-11-27-09
/var/log/aws-routed-eni/ipamd.log.2019-11-27-10
/var/log/aws-routed-eni/eni.out
/var/log/aws-routed-eni/pod.out
/var/log/aws-routed-eni/networkutils-env.out
/var/log/aws-routed-eni/ipamd-env.out
/var/log/aws-routed-eni/eni-configs.out
/var/log/aws-routed-eni/metrics.out
/var/log/aws-routed-eni/ifconfig.out
/var/log/aws-routed-eni/iprule.out
/var/log/aws-routed-eni/iptables-save.out
/var/log/aws-routed-eni/iptables.out
/var/log/aws-routed-eni/iptables-nat.out
/var/log/aws-routed-eni/iptables-mangle.out
/var/log/aws-routed-eni/cni/
/var/log/aws-routed-eni/cni/10-aws.conflist
/var/log/aws-routed-eni/messages
/var/log/aws-routed-eni/route.out
/var/log/aws-routed-eni/sysctls.out

此外,/var/log/aws-routed-eni/messages中会出现许多以下错误: network:无法在路径/opt/cni/bin中找到插件\"aws-cni\“”

没有/opt/cni/bin/aws-cni文件。

有没有人知道这个问题是什么?

EN

回答 1

Stack Overflow用户

发布于 2020-07-03 22:35:08

我也遇到过同样的问题,问题出在kube-proxy上。

看,aws-cni插件实际上是由aws-node pod下载的,所以如果它们不能连接到master,就不会发生这种情况,所以配置文件和二进制文件就会丢失。对我来说修复它的是修复kube-proxy配置(由于现在不支持的标志--resource-container而导致的错误)。这可能不是您遇到的问题,但如果有任何问题,我肯定会检查kube-proxies并查看日志。这些数据不能通过kubectl logs ...获得,但存储在节点上的/var/log/kube-proxy.log中。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59066712

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档