我正在维护一个Kubernetes集群,它在两个不同的荚中包含两个PostgreSQL服务器,一个主服务器和一个副本。副本通过日志传送与主副本同步。
故障导致日志传送开始失败,因此副本不再与主副本同步。
使副本与主副本同步的过程,除其他外,需要停止副本的postgres服务。这就是我遇到麻烦的地方。
当我关闭postgres服务时,Kubernetes似乎正在重新启动容器,postgres服务立即重新启动postgres。我需要在其内部运行postgres服务的容器停止运行,以允许我执行修复中断的复制的后续步骤。
如何让Kubernetes允许我在不重新启动容器的情况下关闭postgres服务?
详细信息:
为了停止复制,我通过kubectl exec -it <pod name> -- /bin/sh在副本荚上运行一个shell,然后从这个shell运行pg_ctl stop。我得到以下答复:
server shutting down
command terminated with exit code 137我被踢出了外壳。
当我运行kubectl describe pod时,我会看到以下内容:
Name: pgset-primary-1
Namespace: qa
Priority: 0
Node: aks-nodepool1-95718424-0/10.240.0.4
Start Time: Fri, 09 Jul 2021 13:48:06 +1200
Labels: app=pgset-primary
controller-revision-hash=pgset-primary-6d7d65c8c7
name=pgset-replica
statefulset.kubernetes.io/pod-name=pgset-primary-1
Annotations: <none>
Status: Running
IP: 10.244.1.42
IPs:
IP: 10.244.1.42
Controlled By: StatefulSet/pgset-primary
Containers:
pgset-primary:
Container ID: containerd://bc00b4904ab683d9495ad020328b5033ecb00d19c9e5b12d22de18f828918455
Image: *****/crunchy-postgres:centos7-9.6.8-1.6.0
Image ID: docker.io/*****/crunchy-postgres@sha256:2850e00f9a619ff4bb6ff889df9bcb2529524ca8110607e4a7d9e36d00879057
Port: 5432/TCP
Host Port: 0/TCP
State: Running
Started: Sat, 06 Nov 2021 18:29:34 +1300
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 06 Nov 2021 18:28:09 +1300
Finished: Sat, 06 Nov 2021 18:29:18 +1300
Ready: True
Restart Count: 6
Limits:
cpu: 250m
memory: 512Mi
Requests:
cpu: 10m
memory: 256Mi
Environment:
PGHOST: /tmp
PG_PRIMARY_USER: primaryuser
PG_MODE: set
PG_PRIMARY_HOST: pgset-primary
PG_REPLICA_HOST: pgset-replica
PG_PRIMARY_PORT: 5432
[...]
ARCHIVE_TIMEOUT: 60
MAX_WAL_KEEP_SEGMENTS: 400
Mounts:
/backrestrepo from backrestrepo (rw)
/pgconf from pgbackrestconf (rw)
/pgdata from pgdata (rw)
/var/run/secrets/kubernetes.io/serviceaccount from pgset-sa-token-nh6ng (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
pgdata:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: pgdata-pgset-primary-1
ReadOnly: false
backrestrepo:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: backrestrepo-pgset-primary-1
ReadOnly: false
pgbackrestconf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: pgbackrest-configmap
Optional: false
pgset-sa-token-nh6ng:
Type: Secret (a volume populated by a Secret)
SecretName: pgset-sa-token-nh6ng
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 88m (x3 over 3h1m) kubelet Back-off restarting failed container
Normal Pulled 88m (x7 over 120d) kubelet Container image "*****/crunchy-postgres:centos7-9.6.8-1.6.0" already present on machine
Normal Created 88m (x7 over 120d) kubelet Created container pgset-primary
Normal Started 88m (x7 over 120d) kubelet Started container pgset-primary这些事件表明集装箱是由库伯内特斯启动的。
吊舱没有活性或准备好的探针,所以当我关闭在容器中运行的postgres服务时,我不知道是什么促使Kubernetes重新启动容器。
发布于 2021-11-06 13:24:30
这是由于restartPolicy造成的。容器生命周期由于其过程正在完成而终止。如果不希望创建新容器,则需要更改这些荚的重新启动策略。
如果这个吊舱是部署的一部分,请查看kubectl explain deployment.spec.template.spec.restartPolicy
https://stackoverflow.com/questions/69862466
复制相似问题