深潜到这个question。我有一个预定的任务,一个永无尽头的容器在同一个吊舱里。当cron作业完成时,要结束永无止境的容器,我使用的是活性探测。
apiVersion: batch/v1
kind: CronJob
metadata:
name: pod-failed
spec:
schedule: "*/10 * * * *"
concurrencyPolicy: Replace
jobTemplate:
spec:
ttlSecondsAfterFinished: 300
activeDeadlineSeconds: 300
backoffLimit: 4
template:
spec:
containers:
- name: docker-http-server
image: katacoda/docker-http-server:latest
ports:
- containerPort: 80
volumeMounts:
- mountPath: /cache
name: cache-volume
volumeMounts:
- mountPath: /cache
name: cache-volume
livenessProbe:
exec:
command:
- sh
- -c
- if test -f "/cache/stop"; then exit 1; fi;
initialDelaySeconds: 5
periodSeconds: 5
- name: busy
image: busybox
imagePullPolicy: IfNotPresent
command:
- sh
- -c
args:
- echo start > /cache/start; sleep 15; echo stop > /cache/stop;
volumeMounts:
- mountPath: /cache
name: cache-volume
restartPolicy: Never
volumes:
- name: cache-volume
emptyDir:
sizeLimit: 10Mi如您所见,cron作业将编写/cache/stop文件,并停止永不结束的容器。问题是,对于某些图像,永无尽头的容器会在故障中停止。是否有办法成功地阻止每一个集装箱?
Name: pod-failed-27827190
Namespace: default
Selector: controller-uid=608efa7c-53cf-4978-9136-9fec772c1c6d
Labels: controller-uid=608efa7c-53cf-4978-9136-9fec772c1c6d
job-name=pod-failed-27827190
Annotations: batch.kubernetes.io/job-tracking:
Controlled By: CronJob/pod-failed
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Mon, 28 Nov 2022 11:30:00 +0100
Active Deadline Seconds: 300s
Pods Statuses: 0 Active (0 Ready) / 0 Succeeded / 5 Failed
Pod Template:
Labels: controller-uid=608efa7c-53cf-4978-9136-9fec772c1c6d
job-name=pod-failed-27827190
Containers:
docker-http-server:
Image: katacoda/docker-http-server:latest
Port: 80/TCP
Host Port: 0/TCP
Liveness: exec [sh -c if test -f "/cache/stop"; then exit 1; fi;] delay=5s timeout=1s period=5s #success=1 #failure=3
Environment: <none>
Mounts:
/cache from cache-volume (rw)
busy:
Image: busybox
Port: <none>
Host Port: <none>
Command:
sh
-c
Args:
echo start > /cache/start; sleep 15; echo stop > /cache/stop;
Environment: <none>
Mounts:
/cache from cache-volume (rw)
Volumes:
cache-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: 10Mi
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m5s job-controller Created pod: pod-failed-27827190-8tqxk
Normal SuccessfulCreate 102s job-controller Created pod: pod-failed-27827190-4gj2s
Normal SuccessfulCreate 79s job-controller Created pod: pod-failed-27827190-5wgfg
Normal SuccessfulCreate 56s job-controller Created pod: pod-failed-27827190-lzv8k
Normal SuccessfulCreate 33s job-controller Created pod: pod-failed-27827190-fr8v5
Warning BackoffLimitExceeded 9s job-controller Job has reached the specified backoff limit如您所见:katacoda/docker-http-server:latest的活性探测失败了。例如,ngix就不会发生这种情况。
apiVersion: batch/v1
kind: CronJob
metadata:
name: pod-failed
spec:
schedule: "*/10 * * * *"
concurrencyPolicy: Replace
jobTemplate:
spec:
ttlSecondsAfterFinished: 300
activeDeadlineSeconds: 300
backoffLimit: 4
template:
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
volumeMounts:
- mountPath: /cache
name: cache-volume
volumeMounts:
- mountPath: /cache
name: cache-volume
livenessProbe:
exec:
command:
- sh
- -c
- if test -f "/cache/stop"; then exit 1; fi;
initialDelaySeconds: 5
periodSeconds: 5
- name: busy
image: busybox
imagePullPolicy: IfNotPresent
command:
- sh
- -c
args:
- echo start > /cache/start; sleep 15; echo stop > /cache/stop;
volumeMounts:
- mountPath: /cache
name: cache-volume
restartPolicy: Never
volumes:
- name: cache-volume
emptyDir:
sizeLimit: 10Mi当然,我所画的永无止境的图像是以失败结束的,我无法控制这个图像。是否有办法迫使工作/吊舱取得成功?
发布于 2022-11-28 13:12:27
这取决于容器的主进程的退出代码。当库伯奈特斯想要阻止它,让它有机会优雅地结束时,每个容器都会收到一个术语信号。这也适用于原因是一个失败的活性探测。我猜nginx退出代码为0,而katacode http服务器返回的代码与0不同。查看golang ListenAndServe方法的文档,它清楚地指出,它以一个非零错误结束:https://pkg.go.dev/net/http#Server.ListenAndServe。
您可以使用一个bash脚本重写容器的默认命令,该脚本启动应用程序,然后等待直到写入停止文件:
containers:
- name: docker-http-server
image: katacoda/docker-http-server:latest
command:
- "sh"
- "-c"
- "/app & while true; do if [ -f /cache/stop ]; then exit 0; fi; sleep 1; done;"在这里,"/app“是katacode http服务器容器的start命令。
https://stackoverflow.com/questions/74599469
复制相似问题