我通过他们的Kubernetes引擎在Google Cloud平台上运行Kubernetes集群。集群版本为1.13.11-gke.14。PHP应用程序pod包含两个容器- Nginx作为反向代理和php-fpm (7.2)。
在google cloud中使用TCP负载均衡器,然后通过Nginx Ingress进行内部路由。
问题是:当我上传一些更大的文件(17MB)时,入口崩溃了,错误如下:
W 2019-12-01T14:26:06.341588Z Dynamic reconfiguration failed: Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
E 2019-12-01T14:26:06.341658Z Unexpected failure reconfiguring NGINX:
W 2019-12-01T14:26:06.345575Z requeuing initial-sync, err Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
I 2019-12-01T14:26:06.354869Z Configuration changes detected, backend reload required.
E 2019-12-01T14:26:06.393528796Z Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
E 2019-12-01T14:26:08.077580Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I 2019-12-01T14:26:12.314526990Z 10.132.0.25 - [10.132.0.25] - - [01/Dec/2019:14:26:12 +0000] "GET / HTTP/2.0" 200 541 "-" "GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)" 99 1.787 [bap-staging-bap-staging-80] [] 10.102.2.4:80 553 1.788 200 5ac9d438e5ca31618386b35f67e2033b
E 2019-12-01T14:26:12.455236Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I 2019-12-01T14:26:13.156963Z Exiting with 0 这是Nginx入口的yaml配置。配置由Gitlab的系统默认,Gitlab自己创建集群。
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: "2019-11-24T17:35:04Z"
generation: 3
labels:
app: nginx-ingress
chart: nginx-ingress-1.22.1
component: controller
heritage: Tiller
release: ingress
name: ingress-nginx-ingress-controller
namespace: gitlab-managed-apps
resourceVersion: "2638973"
selfLink: /apis/apps/v1/namespaces/gitlab-managed-apps/deployments/ingress-nginx-ingress-controller
uid: bfb695c2-0ee0-11ea-a36a-42010a84009f
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx-ingress
release: ingress
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "10254"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: nginx-ingress
component: controller
release: ingress
spec:
containers:
- args:
- /nginx-ingress-controller
- --default-backend-service=gitlab-managed-apps/ingress-nginx-ingress-default-backend
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=gitlab-managed-apps/ingress-nginx-ingress-controller
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
name: nginx-ingress-controller
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
resources: {}
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/nginx/modsecurity/modsecurity.conf
name: modsecurity-template-volume
subPath: modsecurity.conf
- mountPath: /var/log/modsec
name: modsecurity-log-volume
- args:
- /bin/sh
- -c
- tail -f /var/log/modsec/audit.log
image: busybox
imagePullPolicy: Always
name: modsecurity-log
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/log/modsec
name: modsecurity-log-volume
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: ingress-nginx-ingress
serviceAccountName: ingress-nginx-ingress
terminationGracePeriodSeconds: 60
volumes:
- configMap:
defaultMode: 420
items:
- key: modsecurity.conf
path: modsecurity.conf
name: ingress-nginx-ingress-controller
name: modsecurity-template-volume
- emptyDir: {}
name: modsecurity-log-volume我不知道还能尝试什么。我在3个节点上运行集群(2x 1vCPU,1.5 of和1x Preemptile 2vCPU,1.8 of),它们都在SSD驱动器上。
每当我上传图像时,磁盘IO都会抓狂。
发布于 2020-02-27 18:05:31
找到解决方案了。Nginx-ingress pod也包含modsecurity。所有的请求都是由mod安全分析的,更大的上传文件导致了这些崩溃。它根本没有崩溃,但占用了太多CPU和I/O,这导致对所有其他pod的健康检查响应时间更长。解决方案是正确配置modsecurity或禁用。
https://stackoverflow.com/questions/59126572
复制相似问题