首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >解释postgresql部署中反复发生的崩溃

解释postgresql部署中反复发生的崩溃
EN

Database Administration用户
提问于 2020-05-15 02:32:47
回答 2查看 514关注 0票数 1

我遇到了一些意外的系统崩溃,因为postgresql部署在一个繁忙的生产服务器上(Django 1.8应用程序部署在带有gunicorn应用服务器和nginx反向代理的DO液滴上)。

我是一个偶然的DBA,需要一个专家的意见,以拼凑正在发生的事情。我现在毫无头绪,我将在下面提供各种日志数据:

整个应用程序崩溃--新遗留物最终显示出以下错误:

django.db.utils:OperationalError: ERROR: no more connections allowed (max_client_conn)

完整堆栈跟踪为:

代码语言:javascript
复制
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gevent/baseserver.py", line 26, in _handle_and_close_when_done
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/ggevent.py", line 155, in handle
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 56, in handle
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/ggevent.py", line 160, in handle_request
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 114, in handle_request
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/api/web_transaction.py", line 704, in __iter__
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/api/web_transaction.py", line 1080, in __call__
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 189, in __call__
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 108, in get_response
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/hooks/framework_django.py", line 228, in wrapper
File "/home/ubuntu/app/mproject/middleware/mycustommiddleware.py", line 6, in process_request
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/utils/functional.py", line 225, in inner
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/utils/functional.py", line 376, in _setup
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py", line 22, in <lambda>
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py", line 10, in get_user
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/contrib/auth/__init__.py", line 174, in get_user
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/contrib/auth/backends.py", line 93, in get_user
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/models/manager.py", line 127, in manager_method
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/models/query.py", line 328, in get
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/models/query.py", line 144, in __len__
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/models/query.py", line 965, in _fetch_all
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/models/query.py", line 238, in iterator
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 838, in execute_sql
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 164, in cursor
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 135, in _cursor
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 130, in ensure_connection
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/utils.py", line 98, in __exit__
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 130, in ensure_connection
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 119, in connect
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 176, in get_new_connection
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/hooks/database_dbapi2.py", line 102, in __call__
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 126, in connect
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/psycogreen/gevent.py", line 32, in gevent_wait_callback

新文物还显示出Postgres set的大幅增长:

Postgresql慢速日志显示过高的提交时间,如下所示:

代码语言:javascript
复制
2020-05-15 01:22:47.474 UTC [10722] ubuntu@myapp LOG:  duration: 280668.902 ms  statement: COMMIT
2020-05-15 01:22:47.474 UTC [11536] ubuntu@myapp LOG:  duration: 49251.277 ms  statement: COMMIT
2020-05-15 01:22:47.474 UTC [10719] ubuntu@myapp LOG:  duration: 281609.394 ms  statement: COMMIT
2020-05-15 01:22:47.474 UTC [10719] ubuntu@myapp LOG:  could not send data to client: Broken pipe
2020-05-15 01:22:47.474 UTC [10719] ubuntu@myapp FATAL:  connection to client lost
2020-05-15 01:22:47.475 UTC [10672] ubuntu@myapp LOG:  duration: 285371.872 ms  statement: COMMIT

数字海洋的仪表板显示了Load AverageDisk IO的激增(但CPUmemory usagedisk usage仍然很低)。

发生Load Average峰值时的快照统计数据为:

syslog有以下错误日志的无数实例:

代码语言:javascript
复制
May 15 01:22:46 main-app gunicorn[10779]: error: [Errno 111] Connection refused
May 15 01:22:46 main-app gunicorn[10779]: [2020-05-15 01:22:46 +0000] [10874] [ERROR] Socket error processing request.
May 15 01:22:46 main-app gunicorn[10779]: Traceback (most recent call last):
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 66, in handle
May 15 01:22:46 main-app gunicorn[10779]:     six.reraise(*sys.exc_info())
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 56, in handle
May 15 01:22:46 main-app gunicorn[10779]:     self.handle_request(listener_name, req, client, addr)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/ggevent.py", line 160, in handle_request
May 15 01:22:46 main-app gunicorn[10779]:     addr)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 129, in handle_request
May 15 01:22:46 main-app gunicorn[10779]:     six.reraise(*sys.exc_info())
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 114, in handle_request
May 15 01:22:46 main-app gunicorn[10779]:     for item in respiter:
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/api/web_transaction.py", line 704, in __iter__
May 15 01:22:46 main-app gunicorn[10779]:     for item in self.generator:
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/api/web_transaction.py", line 1080, in __call__
May 15 01:22:46 main-app gunicorn[10779]:     self.start_response)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 189, in __call__
May 15 01:22:46 main-app gunicorn[10779]:     response = self.get_response(request)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 218, in get_response
May 15 01:22:46 main-app gunicorn[10779]:     response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/hooks/framework_django.py", line 448, in wrapper
May 15 01:22:46 main-app gunicorn[10779]:     return _wrapped(*args, **kwargs)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/hooks/framework_django.py", line 441, in _wrapped
May 15 01:22:46 main-app gunicorn[10779]:     return wrapped(request, resolver, exc_info)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 256, in handle_uncaught_exception
May 15 01:22:46 main-app gunicorn[10779]:     'request': request
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/logging/__init__.py", line 1193, in error
May 15 01:22:46 main-app gunicorn[10779]:     self._log(ERROR, msg, args, **kwargs)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/logging/__init__.py", line 1286, in _log
May 15 01:22:46 main-app gunicorn[10779]:     self.handle(record)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/logging/__init__.py", line 1296, in handle
May 15 01:22:46 main-app gunicorn[10779]:     self.callHandlers(record)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/logging/__init__.py", line 1336, in callHandlers
May 15 01:22:46 main-app gunicorn[10779]:     hdlr.handle(record)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/logging/__init__.py", line 759, in handle
May 15 01:22:46 main-app gunicorn[10779]:     self.emit(record)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/utils/log.py", line 129, in emit
May 15 01:22:46 main-app gunicorn[10779]:     self.send_mail(subject, message, fail_silently=True, html_message=html_message)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/utils/log.py", line 132, in send_mail
May 15 01:22:46 main-app gunicorn[10779]:     mail.mail_admins(subject, message, *args, connection=self.connection(), **kwargs)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/api/function_trace.py", line 110, in literal_wrapper
May 15 01:22:46 main-app gunicorn[10779]:     return wrapped(*args, **kwargs)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/mail/__init__.py", line 98, in mail_admins
May 15 01:22:46 main-app gunicorn[10779]:     mail.send(fail_silently=fail_silently)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/api/function_trace.py", line 110, in literal_wrapper
May 15 01:22:46 main-app gunicorn[10779]:     return wrapped(*args, **kwargs)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/mail/message.py", line 303, in send
May 15 01:22:46 main-app gunicorn[10779]:     return self.get_connection(fail_silently).send_messages([self])
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/mail/backends/smtp.py", line 100, in send_messages
May 15 01:22:46 main-app gunicorn[10779]:     new_conn_created = self.open()
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/mail/backends/smtp.py", line 58, in open
May 15 01:22:46 main-app gunicorn[10779]:     self.connection = connection_class(self.host, self.port, **connection_params)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/smtplib.py", line 256, in __init__
May 15 01:22:46 main-app gunicorn[10779]:     (code, msg) = self.connect(host, port)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/smtplib.py", line 316, in connect
May 15 01:22:46 main-app gunicorn[10779]:     self.sock = self._get_socket(host, port, self.timeout)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/smtplib.py", line 291, in _get_socket
May 15 01:22:46 main-app gunicorn[10779]:     return socket.create_connection((host, port), timeout)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gevent/socket.py", line 96, in create_connection
May 15 01:22:46 main-app gunicorn[10779]:     sock.connect(sa)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gevent/_socket2.py", line 244, in connect
May 15 01:22:46 main-app gunicorn[10779]:     raise error(err, strerror(err))

来自vmtstat 1的一些行:

代码语言:javascript
复制
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 4698028 280600 16692992    0    0     0    14    0    0 13  1 87  0  0
 3  0      0 4697204 280600 16693016    0    0     0    24 8294 8111 12  0 88  0  0
 3  0      0 4698148 280600 16693044    0    0     0   204 8533 9047 15  0 85  0  0
 4  0      0 4697628 280600 16693092    0    0     0    76 8695 8775 17  1 83  0  0
 3  0      0 4697288 280600 16693140    0    0     8    16 9578 9720 19  1 80  0  0
 3  0      0 4695768 280600 16693200    0    0     0   152 8594 8073 16  1 83  0  0
 6  0      0 4686044 280600 16693272    0    0     0    32 7998 7706 12  1 87  0  0
 5  0      0 4681876 280600 16693308    0    0     0   224 9130 8810 14  1 85  0  0

这就是我所有的一切。我不能把可能发生的事情拼凑起来。现在几乎每天都发生一次。专家能解释一下这一点吗?提前感谢!

注意:如果有必要,SELECT version();的收益率:

代码语言:javascript
复制
PostgreSQL 9.6.16 on x86_64-pc-linux-gnu (Ubuntu 9.6.16-1.pgdg16.04+1), compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, 64-bit
EN

回答 2

Database Administration用户

回答已采纳

发布于 2020-05-15 22:30:34

4.6分钟的提交几乎确定您的IO系统严重不足,或者只是完全关闭。这就解释了剩下的问题。如果连接不能提交,它们通常不会被关闭,所以它们就会建立起来。最终你会得到“错误:不再允许连接”。如果客户端放弃等待提交响应并删除连接,您可能会得到“致命的:连接到服务器上丢失的客户端”。注意这里没有PostgreSQL“崩溃”,它仍然在运行和服务连接(慢慢地),直到超过max_connections为止,此时它仍然在为现有的连接服务。

除了fsync之外,还有一些事情是提交所做的,偶尔,但这些事情很少会导致问题。因此,更确切地说,您可以对pg_stat_activity进行示例,并查看在这些场景中提交的wait_event和wait_event_type。如果wait_event是'WALWriteLock‘或'WALSync’,那是非常明确的。

如果你为一个服务水平>“沙鼠”的存储系统付费,我会说这完全落在数字海洋的手中。

票数 3
EN

Database Administration用户

发布于 2020-05-15 05:44:19

提交时间长表明数据库服务器上的I/O严重超载。这台机器似乎要爬起来了,所以它再也不能及时地响应连接请求了。

检查数据库工作负载并查看导致重载的原因。

票数 2
EN
页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://dba.stackexchange.com/questions/267187

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档