首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >因磁盘压力突然失效

因磁盘压力突然失效
EN

Server Fault用户
提问于 2019-05-12 04:31:58
回答 1查看 5.2K关注 0票数 2

我们有一个带有两个t3的EKS集群,小节点有20 an的临时存储。目前,集群只运行两个小型Nodejs (节点:12-高寒)应用程序。

这几个星期工作得很好,现在我们突然发现磁盘压力错误。

代码语言:javascript
复制
$ kubectl describe nodes
Name:               ip-192-168-101-158.ap-southeast-1.compute.internal
Roles:              
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=t3.small
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=ap-southeast-1
                    failure-domain.beta.kubernetes.io/zone=ap-southeast-1a
                    kubernetes.io/hostname=ip-192-168-101-158.ap-southeast-1.compute.internal
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 31 Mar 2019 17:14:58 +0800
Taints:             node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Sun, 12 May 2019 12:22:47 +0800   Sun, 31 Mar 2019 17:14:58 +0800   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Sun, 12 May 2019 12:22:47 +0800   Sun, 31 Mar 2019 17:14:58 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     True    Sun, 12 May 2019 12:22:47 +0800   Sun, 12 May 2019 06:51:38 +0800   KubeletHasDiskPressure       kubelet has disk pressure
  PIDPressure      False   Sun, 12 May 2019 12:22:47 +0800   Sun, 31 Mar 2019 17:14:58 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Sun, 12 May 2019 12:22:47 +0800   Sun, 31 Mar 2019 17:15:31 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   192.168.101.158
  ExternalIP:   54.169.250.255
  InternalDNS:  ip-192-168-101-158.ap-southeast-1.compute.internal
  ExternalDNS:  ec2-54-169-250-255.ap-southeast-1.compute.amazonaws.com
  Hostname:     ip-192-168-101-158.ap-southeast-1.compute.internal
Capacity:
 attachable-volumes-aws-ebs:  25
 cpu:                         2
 ephemeral-storage:           20959212Ki
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      2002320Ki
 pods:                        11
Allocatable:
 attachable-volumes-aws-ebs:  25
 cpu:                         2
 ephemeral-storage:           19316009748
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      1899920Ki
 pods:                        11
System Info:
 Machine ID:                 ec2aa2ecfbbbdd798e2da086fc04afb6
 System UUID:                EC2AA2EC-FBBB-DD79-8E2D-A086FC04AFB6
 Boot ID:                    62c5eb9d-5f19-4558-8883-2da48ab1969c
 Kernel Version:             4.14.106-97.85.amzn2.x86_64
 OS Image:                   Amazon Linux 2
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://18.6.1
 Kubelet Version:            v1.12.7
 Kube-Proxy Version:         v1.12.7
ProviderID:                  aws:///ap-southeast-1a/i-0a38342b60238d83e
Non-terminated Pods:         (0 in total)
  Namespace                  Name    CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----    ------------  ----------  ---------------  -------------  ---
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests  Limits
  --------                    --------  ------
  cpu                         0 (0%)    0 (0%)
  memory                      0 (0%)    0 (0%)
  ephemeral-storage           0 (0%)    0 (0%)
  attachable-volumes-aws-ebs  0         0
Events:
  Type     Reason                Age                    From                                                         Message
  ----     ------                ----                   ----                                                         -------
  Warning  ImageGCFailed         5m15s (x333 over 40h)  kubelet, ip-192-168-101-158.ap-southeast-1.compute.internal  (combined from similar events): failed to garbage collect required amount of images. Wanted to free 1423169945 bytes, but freed 0 bytes
  Warning  EvictionThresholdMet  17s (x2809 over 3d4h)  kubelet, ip-192-168-101-158.ap-southeast-1.compute.internal  Attempting to reclaim ephemeral-storage


Name:               ip-192-168-197-198.ap-southeast-1.compute.internal
Roles:              
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=t3.small
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=ap-southeast-1
                    failure-domain.beta.kubernetes.io/zone=ap-southeast-1c
                    kubernetes.io/hostname=ip-192-168-197-198.ap-southeast-1.compute.internal
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 31 Mar 2019 17:15:02 +0800
Taints:             node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Sun, 12 May 2019 12:22:42 +0800   Thu, 09 May 2019 06:50:56 +0800   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Sun, 12 May 2019 12:22:42 +0800   Thu, 09 May 2019 06:50:56 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     True    Sun, 12 May 2019 12:22:42 +0800   Sat, 11 May 2019 21:53:44 +0800   KubeletHasDiskPressure       kubelet has disk pressure
  PIDPressure      False   Sun, 12 May 2019 12:22:42 +0800   Sun, 31 Mar 2019 17:15:02 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Sun, 12 May 2019 12:22:42 +0800   Thu, 09 May 2019 06:50:56 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   192.168.197.198
  ExternalIP:   13.229.138.38
  InternalDNS:  ip-192-168-197-198.ap-southeast-1.compute.internal
  ExternalDNS:  ec2-13-229-138-38.ap-southeast-1.compute.amazonaws.com
  Hostname:     ip-192-168-197-198.ap-southeast-1.compute.internal
Capacity:
 attachable-volumes-aws-ebs:  25
 cpu:                         2
 ephemeral-storage:           20959212Ki
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      2002320Ki
 pods:                        11
Allocatable:
 attachable-volumes-aws-ebs:  25
 cpu:                         2
 ephemeral-storage:           19316009748
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      1899920Ki
 pods:                        11
System Info:
 Machine ID:                 ec27ee0765e86a14ed63d771073e63fb
 System UUID:                EC27EE07-65E8-6A14-ED63-D771073E63FB
 Boot ID:                    7869a0ee-dc2f-4082-ae3f-42c5231ab0e3
 Kernel Version:             4.14.106-97.85.amzn2.x86_64
 OS Image:                   Amazon Linux 2
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://18.6.1
 Kubelet Version:            v1.12.7
 Kube-Proxy Version:         v1.12.7
ProviderID:                  aws:///ap-southeast-1c/i-0bd4038f4dade284e
Non-terminated Pods:         (0 in total)
  Namespace                  Name    CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----    ------------  ----------  ---------------  -------------  ---
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests  Limits
  --------                    --------  ------
  cpu                         0 (0%)    0 (0%)
  memory                      0 (0%)    0 (0%)
  ephemeral-storage           0 (0%)    0 (0%)
  attachable-volumes-aws-ebs  0         0
Events:
  Type     Reason                Age                      From                                                         Message
  ----     ------                ----                     ----                                                         -------
  Warning  EvictionThresholdMet  5m40s (x4865 over 3d5h)  kubelet, ip-192-168-197-198.ap-southeast-1.compute.internal  Attempting to reclaim ephemeral-storage
  Warning  ImageGCFailed         31s (x451 over 45h)      kubelet, ip-192-168-197-198.ap-southeast-1.compute.internal  (combined from similar events): failed to garbage collect required amount of images. Wanted to free 4006422937 bytes, but freed 0 bytes

我不完全确定如何调试这个问题,但感觉K8s无法在节点上删除旧的未使用的Docker映像。来验证这个假设吗?还有其他想法吗?

EN

回答 1

Server Fault用户

发布于 2020-05-29 08:23:53

这是我的解决办法:

代码语言:javascript
复制
kubectl drain --delete-local-data --ignore-daemonsets $NODE_NAME && kubectl uncordon $NODE_NAME  

它耗尽所有本地数据&驱逐所有吊舱,然后重新运行所有吊舱。但是,我在找根源问题。

票数 3
EN
页面原文内容由Server Fault提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://serverfault.com/questions/966890

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档