首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Kubelet进程在很长一段时间内具有较高的CPU使用率。

Kubelet进程在很长一段时间内具有较高的CPU使用率。
EN

Stack Overflow用户
提问于 2017-05-19 07:21:07
回答 3查看 9K关注 0票数 3

我有一个kubernetes集群,它由三个节点组成:

  • 1个主节点(虚拟机)
  • 2个工作人员裸金属节点(具有超线程的4个核心xeon -8个逻辑节点)

问题是,top显示kubelet在第一个工作人员上有60-100%的CPU使用率。在journalctl -u kubelet中,我看到了很多信息(每分钟数百条)

代码语言:javascript
复制
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075243    3843 docker_sandbox.go:205] Failed to stop sandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640": Error response from daemon: {"message":"No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640"}
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075360    3843 remote_runtime.go:109] StopPodSandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-p6kwb_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075380    3843 kuberuntime_gc.go:138] Failed to stop sandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-p6kwb_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076549    3843 docker_sandbox.go:205] Failed to stop sandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf": Error response from daemon: {"message":"No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf"}
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076654    3843 remote_runtime.go:109] StopPodSandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-6g8jq_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076676    3843 kuberuntime_gc.go:138] Failed to stop sandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-6g8jq_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.079585    3843 docker_sandbox.go:205] Failed to stop sandbox "014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772": Error response from daemon: {"message":"No such container: 014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772"}
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.079805    3843 remote_runtime.go:109] StopPodSandbox "014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-r30cw_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772

这是在错误地删除在创建过程中失败的任务之后发生的。我用--force删除了所有的豆荚,但库贝利特仍然试图移除它们。另外,我重新启动了对那个工人的kubelet,但没有结果。我怎么才能和库贝利特说话来忘记他们呢?

版本信息

代码语言:javascript
复制
Kubernetes v1.6.1
Docker version 1.12.0, build 8eab29e
Linux kube-worker1 4.4.0-72-generic #93-Ubuntu SMP

容器清单(没有元数据)

代码语言:javascript
复制
  job:
    apiVersion: batch/v1
    kind: Job
    spec:
      template:
        spec:
          containers:
          - name: cron-task
            image: docker.company.ru/image:v2.3.2
            command: ["rake", "db:refresh_views"]
            env:
            - name: RAILS_ENV
              value: namespace
            - name: CONFIG_PATH
              value: /config
            volumeMounts:
            - name: config
              mountPath: /config
          volumes:
          - name: config
            configMap:
              name: task-conf
          restartPolicy: Never

此外,在集群的etcd中,我没有发现任何关于这个荚的部分名称(2533948c46c1)的内容。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2017-05-25 06:46:01

最后我找到了解决办法。

Kubelet存储关于所有豆荚的信息,运行在

代码语言:javascript
复制
/var/lib/dockershim/sandbox

所以当我在那个文件夹中ls时,我找到了所有丢失的豆荚的文件。然后我删除了这些文件,日志消息消失了,CPU使用率恢复到正常值(即使没有kubelet重新启动)

票数 5
EN

Stack Overflow用户

发布于 2017-05-19 10:50:01

这似乎与Kubernetes 1.6.x中的使用CNI时,不能删除带有hostNetwork=true的荚(并生成错误)。问题有关。无论如何,这些信息并不重要,但当您试图找到实际问题时,这当然是很烦人的。尝试使用最新版本的Kubernetes来缓解这些问题。

票数 0
EN

Stack Overflow用户

发布于 2017-06-17 14:04:27

我遇到了与您相同的问题,并对此进行了分析,发现原因是kubelet机制,并删除了'/var/lib/dockershim/sandbox‘。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/44063870

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档