文章/答案/技术大牛

发布

社区首页 >问答首页 >Kubernetes livenessProbe，一些容器停止失败，而另一些在成功。原因是什么？

问Kubernetes livenessProbe，一些容器停止失败，而另一些在成功。原因是什么？
EN

Stack Overflow用户

提问于 2022-11-28 10:50:14

回答 1查看 56关注 0票数 0

深潜到这个question。我有一个预定的任务，一个永无尽头的容器在同一个吊舱里。当cron作业完成时，要结束永无止境的容器，我使用的是活性探测。

apiVersion: batch/v1
kind: CronJob
metadata:
  name: pod-failed
spec:
  schedule: "*/10 * * * *"
  concurrencyPolicy: Replace
  jobTemplate:
    spec:
      ttlSecondsAfterFinished: 300
      activeDeadlineSeconds: 300
      backoffLimit: 4
      template:
        spec:
          containers:
          - name: docker-http-server
            image: katacoda/docker-http-server:latest
            ports:
            - containerPort: 80
            volumeMounts:
            - mountPath: /cache
              name: cache-volume
            volumeMounts:
            - mountPath: /cache
              name: cache-volume
            livenessProbe:
              exec:
                command:
                - sh
                - -c
                - if test -f "/cache/stop"; then exit 1; fi;
              initialDelaySeconds: 5
              periodSeconds: 5
          - name: busy
            image: busybox
            imagePullPolicy: IfNotPresent
            command:
            - sh
            - -c
            args:
            - echo start > /cache/start; sleep 15; echo stop >  /cache/stop; 
            volumeMounts:
            - mountPath: /cache
              name: cache-volume
          restartPolicy: Never
          volumes:
          - name: cache-volume
            emptyDir:
              sizeLimit: 10Mi

如您所见，cron作业将编写/cache/stop文件，并停止永不结束的容器。问题是，对于某些图像，永无尽头的容器会在故障中停止。是否有办法成功地阻止每一个集装箱？

Name:                     pod-failed-27827190
Namespace:                default
Selector:                 controller-uid=608efa7c-53cf-4978-9136-9fec772c1c6d
Labels:                   controller-uid=608efa7c-53cf-4978-9136-9fec772c1c6d
                          job-name=pod-failed-27827190
Annotations:              batch.kubernetes.io/job-tracking: 
Controlled By:            CronJob/pod-failed
Parallelism:              1
Completions:              1
Completion Mode:          NonIndexed
Start Time:               Mon, 28 Nov 2022 11:30:00 +0100
Active Deadline Seconds:  300s
Pods Statuses:            0 Active (0 Ready) / 0 Succeeded / 5 Failed
Pod Template:
  Labels:  controller-uid=608efa7c-53cf-4978-9136-9fec772c1c6d
           job-name=pod-failed-27827190
  Containers:
   docker-http-server:
    Image:        katacoda/docker-http-server:latest
    Port:         80/TCP
    Host Port:    0/TCP
    Liveness:     exec [sh -c if test -f "/cache/stop"; then exit 1; fi;] delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /cache from cache-volume (rw)
   busy:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      sh
      -c
    Args:
      echo start > /cache/start; sleep 15; echo stop >  /cache/stop;
    Environment:  <none>
    Mounts:
      /cache from cache-volume (rw)
  Volumes:
   cache-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  10Mi
Events:
  Type     Reason                Age   From            Message
  ----     ------                ----  ----            -------
  Normal   SuccessfulCreate      2m5s  job-controller  Created pod: pod-failed-27827190-8tqxk
  Normal   SuccessfulCreate      102s  job-controller  Created pod: pod-failed-27827190-4gj2s
  Normal   SuccessfulCreate      79s   job-controller  Created pod: pod-failed-27827190-5wgfg
  Normal   SuccessfulCreate      56s   job-controller  Created pod: pod-failed-27827190-lzv8k
  Normal   SuccessfulCreate      33s   job-controller  Created pod: pod-failed-27827190-fr8v5
  Warning  BackoffLimitExceeded  9s    job-controller  Job has reached the specified backoff limit

如您所见：katacoda/docker-http-server:latest的活性探测失败了。例如，ngix就不会发生这种情况。

apiVersion: batch/v1
kind: CronJob
metadata:
  name: pod-failed
spec:
  schedule: "*/10 * * * *"
  concurrencyPolicy: Replace
  jobTemplate:
    spec:
      ttlSecondsAfterFinished: 300
      activeDeadlineSeconds: 300
      backoffLimit: 4
      template:
        spec:
          containers:
          - name: nginx
            image: nginx
            ports:
            - containerPort: 80
            volumeMounts:
            - mountPath: /cache
              name: cache-volume
            volumeMounts:
            - mountPath: /cache
              name: cache-volume
            livenessProbe:
              exec:
                command:
                - sh
                - -c
                - if test -f "/cache/stop"; then exit 1; fi;
              initialDelaySeconds: 5
              periodSeconds: 5
          - name: busy
            image: busybox
            imagePullPolicy: IfNotPresent
            command:
            - sh
            - -c
            args:
            - echo start > /cache/start; sleep 15; echo stop >  /cache/stop; 
            volumeMounts:
            - mountPath: /cache
              name: cache-volume
          restartPolicy: Never
          volumes:
          - name: cache-volume
            emptyDir:
              sizeLimit: 10Mi

当然，我所画的永无止境的图像是以失败结束的，我无法控制这个图像。是否有办法迫使工作/吊舱取得成功？

kubernetes-cronjob

kubernetes

回答 1

Stack Overflow用户

发布于 2022-11-28 13:12:27

这取决于容器的主进程的退出代码。当库伯奈特斯想要阻止它，让它有机会优雅地结束时，每个容器都会收到一个术语信号。这也适用于原因是一个失败的活性探测。我猜nginx退出代码为0，而katacode http服务器返回的代码与0不同。查看golang ListenAndServe方法的文档，它清楚地指出，它以一个非零错误结束：https://pkg.go.dev/net/http#Server.ListenAndServe。

您可以使用一个bash脚本重写容器的默认命令，该脚本启动应用程序，然后等待直到写入停止文件：

containers:
  - name: docker-http-server
    image: katacoda/docker-http-server:latest
    command:
      - "sh"
      - "-c"
      - "/app & while true; do if [ -f /cache/stop ]; then exit 0; fi; sleep 1; done;"

在这里，"/app“是katacode http服务器容器的start命令。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74599469

复制

相似问题

问Kubernetes livenessProbe，一些容器停止失败，而另一些在成功。原因是什么？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Kubernetes livenessProbe，一些容器停止失败，而另一些在成功。原因是什么？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Kubernetes livenessProbe，一些容器停止失败，而另一些在成功。原因是什么？
EN