我正在运行一个Kubernetes集群,它已经运行了几个月。现在,当我准备部署一些更新的时候,我从服务器上得到了超时。
运行$ kubectl get nodes产量
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get nodes)运行$ kubectl get pods --all-namespaces产量
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get pods)运行$ kubectl get deployments产量
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get deployments.extensions)运行$ kubectl get svc产量
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get services)运行$ kubectl cluster-info会产生结果(注意在主服务器之后没有输出)
Kubernetes master is running at https://cluster.mysite.com
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.当我得到每个命令的这些超时时,故障排除是不可能的。
我怎样才能从这里继续访问我的服务器?我使用的是kube-aws和AWS CloudFormation VPC。
耽误您时间,实在对不起。
编辑
根据请求,我运行了$ kubectl get pods -v 7,在一堆缓存返回之后得到了以下结果:
I0103 16:51:32.196859 25644 round_trippers.go:414] GET cluster.mysite.com/api/v1/nodes
I0103 16:51:32.196888 25644 round_trippers.go:421] Request Headers:
I0103 16:51:32.196894 25644 round_trippers.go:424] Accept: application/json
I0103 16:51:32.196899 25644 round_trippers.go:424] User-Agent: kubectl/v1.8.3 (darwin/amd64) kubernetes/f0efb3c
I0103 16:52:32.239841 25644 round_trippers.go:439] Response Status: 504 Gateway Timeout in 60044 milliseconds我还运行了$ kubectl cluster-info dump -v 7,得到了:
I0103 16:51:32.196888 25644 round_trippers.go:421] Request Headers:
I0103 16:51:32.196894 25644 round_trippers.go:424] Accept: application/json
I0103 16:51:32.196899 25644 round_trippers.go:424] User-Agent: kubectl/v1.8.3 (darwin/amd64) kubernetes/f0efb3c
I0103 16:52:32.239841 25644 round_trippers.go:439] Response Status: 504 Gateway Timeout in 60044 milliseconds
I0103 16:52:32.242362 25644 helpers.go:207] server response object: [{
"metadata": {},
"status": "Failure",
"message": "the server was unable to return a response in the time allotted, but may still be processing the request (get nodes)",
"reason": "Timeout",
"details": {
"kind": "nodes",
"causes": [
{
"reason": "UnexpectedServerResponse",
"message": "{\"metadata\":{},\"status\":\"Failure\",\"message\":\"The list operation against nodes could not be completed at this time, please try again.\",\"reason\":\"ServerTimeout\",\"details\":{\"name\":\"list\",\"kind\":\"nodes\"},\"code\":500}"
}
]
},
"code": 504
}]编辑2:好了,现在我在每个请求上都得到了Unable to connect to the server: EOF,我开始感到害怕了。这是一个生产集群,我甚至无法访问它来尝试故障排除。有人知道该怎么做吗?
编辑3: --我已经意识到,由于2/3节点不同步,etcd集群无法正常工作。重新启动一个节点时,它会再次正确地加入集群,但是第二个节点无法启动服务。没有启动的服务是:
前三个都给出了误差etcdadm-check.service: Control process exited, code=exited status=3,最后一个给出了user@0.service: Start request repeated too quickly.。
对如何处理这件事有什么建议吗?
另外,在还原第二个etcd之后,我在运行任何Unable to connect to the server: x509: certificate signed by unknown authority命令时都会得到kubectl。这是否意味着数据丢失?我的证书有效期已经超过半年了,而且我对它们没有任何改变。
编辑4:我仍然有etcd问题,但在这个时候按照camil的回答中的说明,将随着结果更新。但是,我解决了证书无效的问题,只需通过在中间根CA的正确路径下重新运行$ kube-aws render credentials就可以解决这个问题。
发布于 2018-01-05 00:01:22
为了避免超时,您可以传递此标志--request-timeout='1s'。这将允许进一步调试。
我看到您正在运行kube-aws,所以终止主实例是安全的(至少一个,如果您运行多个主实例)。助理秘书长将自动更换它们。您也可以通过ETCD节点来完成这一任务。
如果问题仍然存在,则必须将ssh转换为母版,并通过运行以下命令来检查日志和服务:
journalctl -xe
systemctl status -l kubelet.service
systemctl status -l flanneld.service
systemctl status -l docker.service
rkt list您还可以使用这个函数从主程序内部使用kubectl进行调试:
kubectl() {
/usr/bin/docker run --rm --net=host \
-v /etc/resolv.conf:/etc/resolv.conf \
-v /srv/kube-aws/plugins:/srv/kube-aws/plugins \
quay.io/coreos/hyperkube:v1.9.0_coreos.0 /hyperkube kubectl "$@"
}然后尝试以下命令:
kubectl get componentstatus
kubectl cluster-info
kubectl get pods -n kube-system
kubectl get events -n kube-system检查从母版到ETCD的连接
export $(cat /etc/etcd-environment | tr -d "'")
/usr/bin/etcdctl \
--ca-file=/etc/kubernetes/ssl/etcd-trusted-ca.pem \
--cert-file=/etc/kubernetes/ssl/etcd-client.pem \
--key-file=/etc/kubernetes/ssl/etcd-client-key.pem \
--endpoints="${ETCD_ENDPOINTS}" \
cluster-health发布于 2021-05-28 19:45:37
rm -r ~/.kube/cache/discovery为我工作。
不过,我的超时消息看起来与您的不一样:
E0528 20:32:29.191243 1730 request.go:975] Unexpected error when reading response body: net/http: request canceled (Client.Timeout exceeded while reading body)https://stackoverflow.com/questions/48080235
复制相似问题