首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >通过php上传文件后Kubernetes - nginx-ingress崩溃

通过php上传文件后Kubernetes - nginx-ingress崩溃
EN

Stack Overflow用户
提问于 2019-12-01 22:41:02
回答 1查看 633关注 0票数 0

我通过他们的Kubernetes引擎在Google Cloud平台上运行Kubernetes集群。集群版本为1.13.11-gke.14。PHP应用程序pod包含两个容器- Nginx作为反向代理和php-fpm (7.2)。

在google cloud中使用TCP负载均衡器,然后通过Nginx Ingress进行内部路由。

问题是:当我上传一些更大的文件(17MB)时,入口崩溃了,错误如下:

代码语言:javascript
复制
W 2019-12-01T14:26:06.341588Z Dynamic reconfiguration failed: Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory 
E 2019-12-01T14:26:06.341658Z Unexpected failure reconfiguring NGINX: 
W 2019-12-01T14:26:06.345575Z requeuing initial-sync, err Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory 
I 2019-12-01T14:26:06.354869Z Configuration changes detected, backend reload required. 
E 2019-12-01T14:26:06.393528796Z Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory

E 2019-12-01T14:26:08.077580Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused 
I 2019-12-01T14:26:12.314526990Z 10.132.0.25 - [10.132.0.25] - - [01/Dec/2019:14:26:12 +0000] "GET / HTTP/2.0" 200 541 "-" "GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)" 99 1.787 [bap-staging-bap-staging-80] [] 10.102.2.4:80 553 1.788 200 5ac9d438e5ca31618386b35f67e2033b

E 2019-12-01T14:26:12.455236Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused 
I 2019-12-01T14:26:13.156963Z Exiting with 0 

这是Nginx入口的yaml配置。配置由Gitlab的系统默认,Gitlab自己创建集群。

代码语言:javascript
复制
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  creationTimestamp: "2019-11-24T17:35:04Z"
  generation: 3
  labels:
    app: nginx-ingress
    chart: nginx-ingress-1.22.1
    component: controller
    heritage: Tiller
    release: ingress
  name: ingress-nginx-ingress-controller
  namespace: gitlab-managed-apps
  resourceVersion: "2638973"
  selfLink: /apis/apps/v1/namespaces/gitlab-managed-apps/deployments/ingress-nginx-ingress-controller
  uid: bfb695c2-0ee0-11ea-a36a-42010a84009f
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx-ingress
      release: ingress
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        prometheus.io/port: "10254"
        prometheus.io/scrape: "true"
      creationTimestamp: null
      labels:
        app: nginx-ingress
        component: controller
        release: ingress
    spec:
      containers:
      - args:
        - /nginx-ingress-controller
        - --default-backend-service=gitlab-managed-apps/ingress-nginx-ingress-default-backend
        - --election-id=ingress-controller-leader
        - --ingress-class=nginx
        - --configmap=gitlab-managed-apps/ingress-nginx-ingress-controller
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        name: nginx-ingress-controller
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        resources: {}
        securityContext:
          allowPrivilegeEscalation: true
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - ALL
          runAsUser: 33
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/nginx/modsecurity/modsecurity.conf
          name: modsecurity-template-volume
          subPath: modsecurity.conf
        - mountPath: /var/log/modsec
          name: modsecurity-log-volume
      - args:
        - /bin/sh
        - -c
        - tail -f /var/log/modsec/audit.log
        image: busybox
        imagePullPolicy: Always
        name: modsecurity-log
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log/modsec
          name: modsecurity-log-volume
          readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: ingress-nginx-ingress
      serviceAccountName: ingress-nginx-ingress
      terminationGracePeriodSeconds: 60
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: modsecurity.conf
            path: modsecurity.conf
          name: ingress-nginx-ingress-controller
        name: modsecurity-template-volume
      - emptyDir: {}
        name: modsecurity-log-volume

我不知道还能尝试什么。我在3个节点上运行集群(2x 1vCPU,1.5 of和1x Preemptile 2vCPU,1.8 of),它们都在SSD驱动器上。

每当我上传图像时,磁盘IO都会抓狂。

Disk IOPS Disk I/O感谢您的帮助。

EN

回答 1

Stack Overflow用户

发布于 2020-02-27 18:05:31

找到解决方案了。Nginx-ingress pod也包含modsecurity。所有的请求都是由mod安全分析的,更大的上传文件导致了这些崩溃。它根本没有崩溃,但占用了太多CPU和I/O,这导致对所有其他pod的健康检查响应时间更长。解决方案是正确配置modsecurity或禁用。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59126572

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档