首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >GCP Kubernetes引擎-nginx入口控制器在大文件上传后崩溃

GCP Kubernetes引擎-nginx入口控制器在大文件上传后崩溃
EN

Server Fault用户
提问于 2020-02-14 14:18:53
回答 2查看 2.1K关注 0票数 2

我在这里尝试运气来解决我在引擎上发生的问题。

简单地说,问题是:当我通过我的15-20 my的PHP应用程序上传文件时,nginx入口控制器崩溃,磁盘IO迅速上升,然后CPU上升,大约需要5-30分钟,直到IO和CPU下降,所有成功的重新启动。

下面是来自nginx入口控制器容器的日志,这些日志记录了我的评论所发生的一切:

在app中成功地接收到上传:

代码语言:javascript
复制
INFO 2020-02-14 14:30:55.481 CET 10.102.1.1 - [10.102.1.1] - - [14/Feb/2020:13:30:55 +0000] "POST /api/v1/contracts/38141/file-system/upload HTTP/2.0" 499 0

NGINX开始产生大量这样的日志:

代码语言:javascript
复制
INFO 2020-02-14 14:30:55.819 CET *�I�g�*��\u001AnK67�@?+�(%u052f��O�yqq$+u$,�b�<*�9#\t��\u0003d\u0006+����I�]A�%u0110jv��hAp\"�63�9\u0019Q�{�x|K�\u000BE\u001C��\"-P%u0079�\u001Ed�Tv

在许多行之后,关于入口端点的日志是不可用的:

代码语言:javascript
复制
WARN 2020-02-14T13:31:05.505984Z Service "gitlab-managed-apps/ingress-nginx-ingress-default-backend" does not have any active Endpoint 
WARN 2020-02-14 14:31:05.526 CET Service "my-app/my-app" does not have any active Endpoint.
WARN 2020-02-14 14:31:05.526 CET Service "my-app/app-staging" does not have any active Endpoint.

..。跳过访问日志..。

代码语言:javascript
复制
WARN 2020-02-14 14:32:34.419 CET failed to renew lease gitlab-managed-apps/ingress-controller-leader-nginx: failed to tryAcquireOrRenew context deadline exceeded
2020-02-14 14:32:42.227 CET attempting to acquire leader lease gitlab-managed-apps/ingress-controller-leader-nginx...
ERROR 2020-02-14 14:32:43.464 CET Failed to update lock: Operation cannot be fulfilled on configmaps "ingress-controller-leader-nginx": the object has been modified; please apply your changes to the latest version and try again

现在又发生了客户端上传的文件,还有大量的符号日志.在这个符号日志之后,就会记录:

代码语言:javascript
复制
INFO 2020-02-14T13:33:37.525466Z Received SIGTERM, shutting down 
INFO 2020-02-14T13:33:55.513100Z Received SIGTERM, shutting down 
INFO 2020-02-14T13:33:55.513155Z Shutting down controller queues 
INFO 2020-02-14T13:33:55.516017Z updating status of Ingress rules (remove) 
ERROR 2020-02-14T13:33:55.570340Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
INFO 2020-02-14T13:33:55.574690Z Shutting down controller queues 
INFO 2020-02-14T13:33:55.576049Z updating status of Ingress rules (remove) 
ERROR 2020-02-14T13:33:55.610722Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
ERROR 2020-02-14T13:33:55.774881Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
INFO 2020-02-14T13:33:55.776321Z failed to renew lease gitlab-managed-apps/ingress-controller-leader-nginx: failed to tryAcquireOrRenew context deadline exceeded 
INFO 2020-02-14T13:33:55.781376Z attempting to acquire leader lease  gitlab-managed-apps/ingress-controller-leader-nginx... 
INFO 2020-02-14T13:33:56.826124Z successfully acquired lease gitlab-managed-apps/ingress-controller-leader-nginx 
INFO 2020-02-14T13:33:56.833827Z new leader elected: ingress-nginx-ingress-controller-756f8d9cbb-86xnh 
ERROR 2020-02-14T13:33:56.933107Z queue has been shutdown, failed to enqueue: &ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,ManagedFields:[],} 
INFO 2020-02-14T13:33:58.027600Z new leader elected: ingress-nginx-ingress-controller-756f8d9cbb-86xnh 
ERROR 2020-02-14T13:33:58.117920Z Failed to update lock: Operation cannot be fulfilled on configmaps "ingress-controller-leader-nginx": the object has been modified; please apply your changes to the latest version and try again 
INFO 2020-02-14T13:33:59.709458Z Stopping NGINX process 
INFO 2020-02-14T13:33:59.718181Z Stopping NGINX process 
ERROR 2020-02-14T13:34:03.010148Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: dial unix /tmp/nginx-status-server.sock: i/o timeout 
ERROR 2020-02-14T13:34:12.627155Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
ERROR 2020-02-14T13:34:12.832624Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
ERROR 2020-02-14T13:34:13.693853Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
ERROR 2020-02-14T13:34:13.693930Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: read unix @->/tmp/nginx-status-server.sock: i/o timeout 
INFO 2020-02-14T13:34:41.620594055Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.620664183Z NGINX Ingress controller
INFO 2020-02-14T13:34:41.620671154Z   Release:       0.25.1
INFO 2020-02-14T13:34:41.620675964Z   Build:         git-5179893a9
INFO 2020-02-14T13:34:41.620681055Z   Repository:    https://github.com/kubernetes/ingress-nginx/
INFO 2020-02-14T13:34:41.620686042Z   nginx version:     openresty/1.15.8.1
INFO 2020-02-14T13:34:41.620691348Z 
INFO 2020-02-14T13:34:41.620695778Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.620701128Z 
INFO 2020-02-14T13:34:41.622564Z Watching for Ingress class: nginx 
WARN 2020-02-14T13:34:41.622863Z SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false) 
INFO 2020-02-14T13:34:41.623360607Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.623418446Z NGINX Ingress controller
INFO 2020-02-14T13:34:41.623425256Z   Release:       0.25.1
INFO 2020-02-14T13:34:41.623426Z Watching for Ingress class: nginx 
INFO 2020-02-14T13:34:41.623430244Z   Build:         git-5179893a9
INFO 2020-02-14T13:34:41.623435128Z   Repository:    https://github.com/kubernetes/ingress-nginx/
INFO 2020-02-14T13:34:41.623441533Z   nginx version: openresty/1.15.8.1
INFO 2020-02-14T13:34:41.623447006Z 
INFO 2020-02-14T13:34:41.623451329Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.623456382Z 
WARN 2020-02-14T13:34:41.623731Z SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false) 
ERROR 2020-02-14T13:34:41.629507140Z nginx version: openresty/1.15.8.1
WARN 2020-02-14T13:34:41.633116Z Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work. 
INFO 2020-02-14T13:34:41.633644Z Creating API client for https://10.103.0.1:443 
ERROR 2020-02-14T13:34:41.640959117Z nginx version: openresty/1.15.8.1
WARN 2020-02-14T13:34:41.642065Z Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work. 
INFO 2020-02-14T13:34:41.642376Z Creating API client for https://10.103.0.1:443 
INFO 2020-02-14T13:34:41.682018Z Running in Kubernetes cluster version v1.13+ (v1.13.12-gke.25) - git (clean) commit 654de8cac69f1fc5db6f2de0b88d6d027bc15828 - platform linux/amd64 
INFO 2020-02-14T13:34:41.700374Z Running in Kubernetes cluster version v1.13+ (v1.13.12-gke.25) - git (clean) commit 654de8cac69f1fc5db6f2de0b88d6d027bc15828 - platform linux/amd64 

可以看到nginx已经崩溃并重新启动(我不知道为什么)。

我的问题是:What可能会发生nginx的健康检查失败而pod被终止的情况吗?我能以某种方式配置nginx入口来避免这种情况发生吗?是否会因为大量日志记录和磁盘故障而发生这种情况?还是因为它正在缓冲nginx中上传的文件,并且需要花费太多的时间来响应健康检查?如何避免?

下面是我对nginx的注释,我已经尝试过了,但是它不适用于这个注释,也没有它们:

代码语言:javascript
复制
nginx.ingress.kubernetes.io/client-body-buffer-size: 5m
nginx.ingress.kubernetes.io/proxy-body-size: 15m
nginx.ingress.kubernetes.io/proxy-buffering: "on"
nginx.org/client-max-body-size: 15m

技术和版本: Kubernetes主版本1.13.12-gke.25节点1.13.11-gke.14 Nginx入口控制器0.25.1谢谢您的帮助,因为我不知道应该尝试更多。

EN

回答 2

Server Fault用户

发布于 2020-02-14 20:23:07

为了减轻失败的健康检查,我建议将您的健康检查的超时值提高到0.26.0,因为在这个版本上似乎已经安装了一个修复程序。我建议将这些优化用于nginx,以减少缓冲时间。记住,Google不支持这些优化。

票数 1
EN

Server Fault用户

发布于 2020-02-18 17:09:57

看来我已经解决了问题。Nginx-ingress还包括modsecurity (WAF),它启用了许多规则。在禁用了modsecurity之后,巨大的日志已经消失,直到现在它似乎还能工作。现在,我可以成功地一次上传20次30 of文件,而不会出现任何日志和磁盘I/O问题。如果这个答案长期有效,我将在周末更新这个答案。

票数 1
EN
页面原文内容由Server Fault提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://serverfault.com/questions/1003089

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档