我在这里尝试运气来解决我在引擎上发生的问题。
简单地说,问题是:当我通过我的15-20 my的PHP应用程序上传文件时,nginx入口控制器崩溃,磁盘IO迅速上升,然后CPU上升,大约需要5-30分钟,直到IO和CPU下降,所有成功的重新启动。
下面是来自nginx入口控制器容器的日志,这些日志记录了我的评论所发生的一切:
在app中成功地接收到上传:
INFO 2020-02-14 14:30:55.481 CET 10.102.1.1 - [10.102.1.1] - - [14/Feb/2020:13:30:55 +0000] "POST /api/v1/contracts/38141/file-system/upload HTTP/2.0" 499 0NGINX开始产生大量这样的日志:
INFO 2020-02-14 14:30:55.819 CET *�I�g�*��\u001AnK67�@?+�(%u052f��O�yqq$+u$,�b�<*�9#\t��\u0003d\u0006+����I�]A�%u0110jv��hAp\"�63�9\u0019Q�{�x|K�\u000BE\u001C��\"-P%u0079�\u001Ed�Tv在许多行之后,关于入口端点的日志是不可用的:
WARN 2020-02-14T13:31:05.505984Z Service "gitlab-managed-apps/ingress-nginx-ingress-default-backend" does not have any active Endpoint
WARN 2020-02-14 14:31:05.526 CET Service "my-app/my-app" does not have any active Endpoint.
WARN 2020-02-14 14:31:05.526 CET Service "my-app/app-staging" does not have any active Endpoint...。跳过访问日志..。
WARN 2020-02-14 14:32:34.419 CET failed to renew lease gitlab-managed-apps/ingress-controller-leader-nginx: failed to tryAcquireOrRenew context deadline exceeded
2020-02-14 14:32:42.227 CET attempting to acquire leader lease gitlab-managed-apps/ingress-controller-leader-nginx...
ERROR 2020-02-14 14:32:43.464 CET Failed to update lock: Operation cannot be fulfilled on configmaps "ingress-controller-leader-nginx": the object has been modified; please apply your changes to the latest version and try again现在又发生了客户端上传的文件,还有大量的符号日志.在这个符号日志之后,就会记录:
INFO 2020-02-14T13:33:37.525466Z Received SIGTERM, shutting down
INFO 2020-02-14T13:33:55.513100Z Received SIGTERM, shutting down
INFO 2020-02-14T13:33:55.513155Z Shutting down controller queues
INFO 2020-02-14T13:33:55.516017Z updating status of Ingress rules (remove)
ERROR 2020-02-14T13:33:55.570340Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
INFO 2020-02-14T13:33:55.574690Z Shutting down controller queues
INFO 2020-02-14T13:33:55.576049Z updating status of Ingress rules (remove)
ERROR 2020-02-14T13:33:55.610722Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
ERROR 2020-02-14T13:33:55.774881Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
INFO 2020-02-14T13:33:55.776321Z failed to renew lease gitlab-managed-apps/ingress-controller-leader-nginx: failed to tryAcquireOrRenew context deadline exceeded
INFO 2020-02-14T13:33:55.781376Z attempting to acquire leader lease gitlab-managed-apps/ingress-controller-leader-nginx...
INFO 2020-02-14T13:33:56.826124Z successfully acquired lease gitlab-managed-apps/ingress-controller-leader-nginx
INFO 2020-02-14T13:33:56.833827Z new leader elected: ingress-nginx-ingress-controller-756f8d9cbb-86xnh
ERROR 2020-02-14T13:33:56.933107Z queue has been shutdown, failed to enqueue: &ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,ManagedFields:[],}
INFO 2020-02-14T13:33:58.027600Z new leader elected: ingress-nginx-ingress-controller-756f8d9cbb-86xnh
ERROR 2020-02-14T13:33:58.117920Z Failed to update lock: Operation cannot be fulfilled on configmaps "ingress-controller-leader-nginx": the object has been modified; please apply your changes to the latest version and try again
INFO 2020-02-14T13:33:59.709458Z Stopping NGINX process
INFO 2020-02-14T13:33:59.718181Z Stopping NGINX process
ERROR 2020-02-14T13:34:03.010148Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: dial unix /tmp/nginx-status-server.sock: i/o timeout
ERROR 2020-02-14T13:34:12.627155Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: read unix @->/tmp/nginx-status-server.sock: i/o timeout
ERROR 2020-02-14T13:34:12.832624Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: read unix @->/tmp/nginx-status-server.sock: i/o timeout
ERROR 2020-02-14T13:34:13.693853Z healthcheck error: Get http+unix://nginx-status/healthz: read unix @->/tmp/nginx-status-server.sock: i/o timeout
ERROR 2020-02-14T13:34:13.693930Z healthcheck error: Get http+unix://nginx-status/is-dynamic-lb-initialized: read unix @->/tmp/nginx-status-server.sock: i/o timeout
INFO 2020-02-14T13:34:41.620594055Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.620664183Z NGINX Ingress controller
INFO 2020-02-14T13:34:41.620671154Z Release: 0.25.1
INFO 2020-02-14T13:34:41.620675964Z Build: git-5179893a9
INFO 2020-02-14T13:34:41.620681055Z Repository: https://github.com/kubernetes/ingress-nginx/
INFO 2020-02-14T13:34:41.620686042Z nginx version: openresty/1.15.8.1
INFO 2020-02-14T13:34:41.620691348Z
INFO 2020-02-14T13:34:41.620695778Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.620701128Z
INFO 2020-02-14T13:34:41.622564Z Watching for Ingress class: nginx
WARN 2020-02-14T13:34:41.622863Z SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
INFO 2020-02-14T13:34:41.623360607Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.623418446Z NGINX Ingress controller
INFO 2020-02-14T13:34:41.623425256Z Release: 0.25.1
INFO 2020-02-14T13:34:41.623426Z Watching for Ingress class: nginx
INFO 2020-02-14T13:34:41.623430244Z Build: git-5179893a9
INFO 2020-02-14T13:34:41.623435128Z Repository: https://github.com/kubernetes/ingress-nginx/
INFO 2020-02-14T13:34:41.623441533Z nginx version: openresty/1.15.8.1
INFO 2020-02-14T13:34:41.623447006Z
INFO 2020-02-14T13:34:41.623451329Z -------------------------------------------------------------------------------
INFO 2020-02-14T13:34:41.623456382Z
WARN 2020-02-14T13:34:41.623731Z SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
ERROR 2020-02-14T13:34:41.629507140Z nginx version: openresty/1.15.8.1
WARN 2020-02-14T13:34:41.633116Z Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
INFO 2020-02-14T13:34:41.633644Z Creating API client for https://10.103.0.1:443
ERROR 2020-02-14T13:34:41.640959117Z nginx version: openresty/1.15.8.1
WARN 2020-02-14T13:34:41.642065Z Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
INFO 2020-02-14T13:34:41.642376Z Creating API client for https://10.103.0.1:443
INFO 2020-02-14T13:34:41.682018Z Running in Kubernetes cluster version v1.13+ (v1.13.12-gke.25) - git (clean) commit 654de8cac69f1fc5db6f2de0b88d6d027bc15828 - platform linux/amd64
INFO 2020-02-14T13:34:41.700374Z Running in Kubernetes cluster version v1.13+ (v1.13.12-gke.25) - git (clean) commit 654de8cac69f1fc5db6f2de0b88d6d027bc15828 - platform linux/amd64 可以看到nginx已经崩溃并重新启动(我不知道为什么)。
我的问题是:What可能会发生nginx的健康检查失败而pod被终止的情况吗?我能以某种方式配置nginx入口来避免这种情况发生吗?是否会因为大量日志记录和磁盘故障而发生这种情况?还是因为它正在缓冲nginx中上传的文件,并且需要花费太多的时间来响应健康检查?如何避免?
下面是我对nginx的注释,我已经尝试过了,但是它不适用于这个注释,也没有它们:
nginx.ingress.kubernetes.io/client-body-buffer-size: 5m
nginx.ingress.kubernetes.io/proxy-body-size: 15m
nginx.ingress.kubernetes.io/proxy-buffering: "on"
nginx.org/client-max-body-size: 15m技术和版本: Kubernetes主版本1.13.12-gke.25节点1.13.11-gke.14 Nginx入口控制器0.25.1谢谢您的帮助,因为我不知道应该尝试更多。
发布于 2020-02-14 20:23:07
发布于 2020-02-18 17:09:57
看来我已经解决了问题。Nginx-ingress还包括modsecurity (WAF),它启用了许多规则。在禁用了modsecurity之后,巨大的日志已经消失,直到现在它似乎还能工作。现在,我可以成功地一次上传20次30 of文件,而不会出现任何日志和磁盘I/O问题。如果这个答案长期有效,我将在周末更新这个答案。
https://serverfault.com/questions/1003089
复制相似问题