我的AKS有03个节点,我尝试手动从3个节点向外扩展到4个节点。扩大规模是可以的。大约20分钟后,所有04节点都处于NotReady服务状态,所有kube-system服务都未就绪。
NAME STATUS ROLES AGE VERSION
aks-agentpool-40760006-vmss000000 Ready agent 16m v1.18.14
aks-agentpool-40760006-vmss000001 Ready agent 17m v1.18.14
aks-agentpool-40760006-vmss000002 Ready agent 16m v1.18.14
aks-agentpool-40760006-vmss000003 Ready agent 11m v1.18.14
NAME STATUS ROLES AGE VERSION
aks-agentpool-40760006-vmss000000 NotReady agent 23m v1.18.14
aks-agentpool-40760006-vmss000002 NotReady agent 24m v1.18.14
aks-agentpool-40760006-vmss000003 NotReady agent 19m v1.18.14
k get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-748cdb7bf4-7frq2 0/1 Pending 0 10m
coredns-748cdb7bf4-vg5nn 0/1 Pending 0 10m
coredns-748cdb7bf4-wrhxs 1/1 Terminating 0 28m
coredns-autoscaler-868b684fd4-2gb8f 0/1 Pending 0 10m
kube-proxy-p6wmv 1/1 Running 0 28m
kube-proxy-sksz6 1/1 Running 0 23m
kube-proxy-vpb2g 1/1 Running 0 28m
metrics-server-58fdc875d5-sbckj 0/1 Pending 0 10m
tunnelfront-5d74798f6b-w6rvn 0/1 Pending 0 10m节点日志显示:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 25m kubelet Starting kubelet.
Normal NodeHasSufficientMemory 25m (x2 over 25m) kubelet Node aks-agentpool-40760006-vmss000000 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 25m (x2 over 25m) kubelet Node aks-agentpool-40760006-vmss000000 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 25m (x2 over 25m) kubelet Node aks-agentpool-40760006-vmss000000 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 25m kubelet Updated Node Allocatable limit across pods
Normal Starting 25m kube-proxy Starting kube-proxy.
Normal NodeReady 24m kubelet Node aks-agentpool-40760006-vmss000000 status is now: NodeReady
Warning FailedToCreateRoute 5m5s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 50.264754ms: timed out waiting for the condition
Warning FailedToCreateRoute 4m55s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 45.945658ms: timed out waiting for the condition
Warning FailedToCreateRoute 4m45s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 46.180158ms: timed out waiting for the condition
Warning FailedToCreateRoute 4m35s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 46.550858ms: timed out waiting for the condition
Warning FailedToCreateRoute 4m25s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 44.74355ms: timed out waiting for the condition
Warning FailedToCreateRoute 4m15s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 42.428456ms: timed out waiting for the condition
Warning FailedToCreateRoute 4m5s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 41.664858ms: timed out waiting for the condition
Warning FailedToCreateRoute 3m55s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 48.456954ms: timed out waiting for the condition
Warning FailedToCreateRoute 3m45s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 38.611964ms: timed out waiting for the condition
Warning FailedToCreateRoute 65s (x16 over 3m35s) route_controller (combined from similar events): Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 13.972487ms: timed out waiting for the condition发布于 2021-02-04 20:54:55
您可以使用cluster autoscaler选项来避免以后出现此类情况。
为了跟上Azure Kubernetes Service (AKS)中的应用程序需求,您可能需要调整运行工作负载的节点数量。集群自动伸缩器组件可以监视集群中由于资源限制而无法调度的pod。当检测到问题时,增加节点池中的节点数量以满足应用需求。还会定期检查节点是否缺少正在运行的pod,然后根据需要减少节点的数量。这种自动扩展或缩减AKS集群中节点数量的功能使您能够运行高效、经济实惠的集群。
您可以使用Update an existing AKS cluster to enable the cluster autoscaler来使用您当前的资源组。
az aks update \
--resource-group myResourceGroup \
--name myAKSCluster \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 3发布于 2021-02-05 02:15:28
现在看起来还可以。我没有扩容节点的权利。
https://stackoverflow.com/questions/66042285
复制相似问题