在两个节点上运行microk8s。最近,由于microk8s.daemon-containerd服务启动失败,导致主节点无法进入Ready状态。这是在尝试在k8s群集中运行cert-manager配置后发生的。
据我所知,cert-manager-webhook pod在第二个节点上运行正常。
我已经尝试过microk8s stop/microk8s start。我甚至尝试过microk8s reset,但containerd总是显示相同的错误。
输出:
$ kubectl get node
NAME STATUS ROLES AGE VERSION
pi-k8s-00 NotReady <none> 77d v1.18.6-1+b4f4cb0b7fe3c1
pi-k8s-01 Ready <none> 77d v1.19.2-34+37bbd8cebecb60$ kubectl get pod -n cert-manager
NAME READY STATUS RESTARTS AGE
cert-manager-676b755d5f-6bjxv 1/1 Running 0 12m
cert-manager-cainjector-795f67b984-tsmw9 1/1 Running 3 12m
cert-manager-webhook-86c4dcd4b5-bgrmb 1/1 Running 0 12m$ sudo journalctl -u snap.microk8s.daemon-containerd
...
Oct 17 10:42:33 pi-k8s-00 microk8s.daemon-containerd[44363]: time="2020-10-17T10:42:33.848409047Z" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name \"cert-manager-webhook>
Oct 17 10:42:33 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Main process exited, code=exited, status=1/FAILURE
Oct 17 10:42:33 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Failed with result 'exit-code'.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Scheduled restart job, restart counter is at 5.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: Stopped Service for snap application microk8s.daemon-containerd.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Start request repeated too quickly.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: snap.microk8s.daemon-containerd.service: Failed with result 'exit-code'.
Oct 17 10:42:34 pi-k8s-00 systemd[1]: Failed to start Service for snap application microk8s.daemon-containerd.$ uname -a
Linux pi-k8s-00 5.4.0-1021-raspi #24-Ubuntu SMP PREEMPT Mon Oct 5 09:59:23 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux如何让主节点重新处于良好的运行/就绪状态?
-更新
输出:
$ less /var/snap/microk8s/current/inspection-report/snap.microk8s.daemon-containerd/journal.log
Oct 18 14:48:03 pi-k8s-00 microk8s.daemon-containerd[239043]: time="2020-10-18T14:48:03.936439781Z" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve sandbox name \"cert-manager-webhook-64b9b4fdfd-9d6tm_cert-manager_81fb08ac-7e87-42bd-9123-b0b8b098fe50_3\": name \"cert-manager-webhook-64b9b4fdfd-9d6tm_cert-manager_81fb08ac-7e87-42bd-9123-b0b8b098fe50_3\" is reserved for \"149b0aa92e3eb042f87353ead44a7247e756c8071f804bfbec3b781a5565e52c\""最后一个日志显示沙箱名称是为给定的id保留的。
那是什么id呢?我该去哪里,人们应该做些什么来释放这些东西?
在查看'failed to reserve sandbox name' error after hard reboot #1014中的评论时,我尝试了:
$ sudo ctr -n=k8s.io containers info 149b0aa92e3eb042f87353ead44a7247e756c8071f804bfbec3b781a5565e52c
ctr: container "149b0aa92e3eb042f87353ead44a7247e756c8071f804bfbec3b781a5565e52c" in namespace "k8s.io": not found但是从输出中可以看出,不存在具有该id的容器?
发布于 2020-10-24 18:59:01
似乎容器数据已损坏,因此解决此问题的方法是通过执行以下操作重新创建容器数据:
$ microk8s.stop
$ mv /var/snap/microk8s/common/var/lib/containerd /var/snap/microk8s/common/var/lib/_containerd
$ microk8s.startKubernetes主节点再次显示状态为Ready
$ kubectl get node
NAME STATUS ROLES AGE VERSION
pi-k8s-00 Ready <none> 84d v1.19.2-34+37bbd8cebecb60
pi-k8s-01 Ready <none> 84d v1.19.2-34+37bbd8cebecb60有关更多详细信息,请参阅我在microk8s github问题页面Failed to Reserve Sandbox Name上的帖子。
https://stackoverflow.com/questions/64401699
复制相似问题