文章/答案/技术大牛

发布

社区首页 >问答首页 >GCP Kubernetes引擎-nginx入口控制器在大文件上传后崩溃

问GCP Kubernetes引擎-nginx入口控制器在大文件上传后崩溃
EN

Server Fault用户

提问于 2020-02-14 14:18:53

回答 2查看 2.1K关注 0票数 2

我在这里尝试运气来解决我在引擎上发生的问题。

简单地说，问题是:当我通过我的15-20 my的PHP应用程序上传文件时，nginx入口控制器崩溃，磁盘IO迅速上升，然后CPU上升，大约需要5-30分钟，直到IO和CPU下降，所有成功的重新启动。

下面是来自nginx入口控制器容器的日志，这些日志记录了我的评论所发生的一切：

在app中成功地接收到上传：

INFO 2020-02-14 14:30:55.481 CET 10.102.1.1 - [10.102.1.1] - - [14/Feb/2020:13:30:55 +0000] "POST /api/v1/contracts/38141/file-system/upload HTTP/2.0" 499 0

NGINX开始产生大量这样的日志：

INFO 2020-02-14 14:30:55.819 CET *�I�g�*��\u001AnK67�@?+�(%u052f��O�yqq$+u$,�b�<*�9#\t��\u0003d\u0006+����I�]A�%u0110jv��hAp\"�63�9\u0019Q�{�x|K�\u000BE\u001C��\"-P%u0079�\u001Ed�Tv

在许多行之后，关于入口端点的日志是不可用的：

WARN 2020-02-14T13:31:05.505984Z Service "gitlab-managed-apps/ingress-nginx-ingress-default-backend" does not have any active Endpoint 
WARN 2020-02-14 14:31:05.526 CET Service "my-app/my-app" does not have any active Endpoint.
WARN 2020-02-14 14:31:05.526 CET Service "my-app/app-staging" does not have any active Endpoint.

..。跳过访问日志..。

WARN 2020-02-14 14:32:34.419 CET failed to renew lease gitlab-managed-apps/ingress-controller-leader-nginx: failed to tryAcquireOrRenew context deadline exceeded
2020-02-14 14:32:42.227 CET attempting to acquire leader lease gitlab-managed-apps/ingress-controller-leader-nginx...
ERROR 2020-02-14 14:32:43.464 CET Failed to update lock: Operation cannot be fulfilled on configmaps "ingress-controller-leader-nginx": the object has been modified; please apply your changes to the latest version and try again

现在又发生了客户端上传的文件，还有大量的符号日志.在这个符号日志之后，就会记录：

INFO 2020-02-14T13:33:37.525466Z Received SIGTERM, shutting down 
INFO 2020-02-14T13:33:55.513100Z Received SIGTERM, shutting down 
INFO 2020-02-14T13:33:55.513155Z Shutting down controller queues 
INFO 2020-02-14T13:33:55.516017Z updating status of Ingress rules (remove) 
ERROR 2020-02-14T13:33:55.570340Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
INFO 2020-02-14T13:33:55.574690Z Shutting down controller queues 
INFO 2020-02-14T13:33:55.576049Z updating status of Ingress rules (remove) 
ERROR 2020-02-14T13:33:55.610722Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
ERROR 2020-02-14T13:33:55.774881Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
INFO 2020-02-14T13:33:55.776321Z failed to renew lease gitlab-managed-apps/ingress-controller-leader-nginx: failed to tryAcquireOrRenew context deadline exceeded 
INFO 2020-02-14T13:33:55.781376Z attempting to acquire leader lease  gitlab-managed-apps/ingress-controller-leader-nginx... 
INFO 2020-02-14T13:33:56.826124Z successfully acquired lease gitlab-managed-apps/ingress-controller-leader-nginx 
INFO 2020-02-14T13:33:56.833827Z new leader elected: ingress-nginx-ingress-controller-756f8d9cbb-86xnh 
ERROR 2020-02-14T13:33:56.933107Z queue has been shutdown, failed to enqueue: &ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,ManagedFields:[],} 
INFO 2020-02-14T13:33:58.027600Z new leader elected: ingress-nginx-ingress-controller-756f8d9cbb-86xnh 
ERROR 2020-02-14T13:33:58.117920Z Failed to update lock: Operation cannot be fulfilled on configmaps "ingress-controller-leader-nginx": the object has been modified; please apply your changes to the latest version and try again 
INFO 2020-02-14T13:33:59.709458Z Stopping NGINX process 
INFO 2020-02-14T13:33:59.718181Z Stopping NGINX process 
ERROR 2020-02-14T13:34:03.010148Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: dial unix /tmp/nginx-status-server.sock: i/o timeout 
ERROR 2020-02-14T13:34:12.627155Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
ERROR 2020-02-14T13:34:12.832624Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
ERROR 2020-02-14T13:34:13.693853Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
ERROR 2020-02-14T13:34:13.693930Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
INFO 2020-02-14T13:34:41.620594055Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.620664183Z NGINX Ingress controller
INFO 2020-02-14T13:34:41.620671154Z   Release:       0.25.1
INFO 2020-02-14T13:34:41.620675964Z   Build:         git-5179893a9
INFO 2020-02-14T13:34:41.620681055Z   Repository:    https://github.com/kubernetes/ingress-nginx/
INFO 2020-02-14T13:34:41.620686042Z   nginx version:     openresty/1.15.8.1
INFO 2020-02-14T13:34:41.620691348Z 
INFO 2020-02-14T13:34:41.620695778Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.620701128Z 
INFO 2020-02-14T13:34:41.622564Z Watching for Ingress class: nginx 
WARN 2020-02-14T13:34:41.622863Z SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false) 
INFO 2020-02-14T13:34:41.623360607Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.623418446Z NGINX Ingress controller
INFO 2020-02-14T13:34:41.623425256Z   Release:       0.25.1
INFO 2020-02-14T13:34:41.623426Z Watching for Ingress class: nginx 
INFO 2020-02-14T13:34:41.623430244Z   Build:         git-5179893a9
INFO 2020-02-14T13:34:41.623435128Z   Repository:    https://github.com/kubernetes/ingress-nginx/
INFO 2020-02-14T13:34:41.623441533Z   nginx version: openresty/1.15.8.1
INFO 2020-02-14T13:34:41.623447006Z 
INFO 2020-02-14T13:34:41.623451329Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.623456382Z 
WARN 2020-02-14T13:34:41.623731Z SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false) 
ERROR 2020-02-14T13:34:41.629507140Z nginx version: openresty/1.15.8.1
WARN 2020-02-14T13:34:41.633116Z Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work. 
INFO 2020-02-14T13:34:41.633644Z Creating API client for https://10.103.0.1:443 
ERROR 2020-02-14T13:34:41.640959117Z nginx version: openresty/1.15.8.1
WARN 2020-02-14T13:34:41.642065Z Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work. 
INFO 2020-02-14T13:34:41.642376Z Creating API client for https://10.103.0.1:443 
INFO 2020-02-14T13:34:41.682018Z Running in Kubernetes cluster version v1.13+ (v1.13.12-gke.25) - git (clean) commit 654de8cac69f1fc5db6f2de0b88d6d027bc15828 - platform linux/amd64 
INFO 2020-02-14T13:34:41.700374Z Running in Kubernetes cluster version v1.13+ (v1.13.12-gke.25) - git (clean) commit 654de8cac69f1fc5db6f2de0b88d6d027bc15828 - platform linux/amd64

可以看到nginx已经崩溃并重新启动(我不知道为什么)。

我的问题是：What可能会发生nginx的健康检查失败而pod被终止的情况吗？我能以某种方式配置nginx入口来避免这种情况发生吗？是否会因为大量日志记录和磁盘故障而发生这种情况？还是因为它正在缓冲nginx中上传的文件，并且需要花费太多的时间来响应健康检查？如何避免？

下面是我对nginx的注释，我已经尝试过了，但是它不适用于这个注释，也没有它们：

nginx.ingress.kubernetes.io/client-body-buffer-size: 5m
nginx.ingress.kubernetes.io/proxy-body-size: 15m
nginx.ingress.kubernetes.io/proxy-buffering: "on"
nginx.org/client-max-body-size: 15m

技术和版本: Kubernetes主版本1.13.12-gke.25节点1.13.11-gke.14 Nginx入口控制器0.25.1谢谢您的帮助，因为我不知道应该尝试更多。

ingress

nginx

google-kubernetes-engine

upload

回答 2

Server Fault用户

发布于 2020-02-14 20:23:07

为了减轻失败的健康检查，我建议将您的健康检查的超时值提高到0.26.0，因为在这个版本上似乎已经安装了一个修复程序。我建议将这些优化用于nginx，以减少缓冲时间。记住，Google不支持这些优化。

票数 1

Server Fault用户

发布于 2020-02-18 17:09:57

看来我已经解决了问题。Nginx-ingress还包括modsecurity (WAF)，它启用了许多规则。在禁用了modsecurity之后，巨大的日志已经消失，直到现在它似乎还能工作。现在，我可以成功地一次上传20次30 of文件，而不会出现任何日志和磁盘I/O问题。如果这个答案长期有效，我将在周末更新这个答案。

票数 1

页面原文内容由Server Fault提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://serverfault.com/questions/1003089

复制

相似问题

问GCP Kubernetes引擎-nginx入口控制器在大文件上传后崩溃
EN

回答 2

Server Fault用户

Server Fault用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问GCP Kubernetes引擎-nginx入口控制器在大文件上传后崩溃EN

回答 2

Server Fault用户

Server Fault用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问GCP Kubernetes引擎-nginx入口控制器在大文件上传后崩溃
EN