我目前正在生产中部署一个应用程序(nodejs websocket服务器)在AWS豆茎上,使用docker环境。
定期地,容器‘崩溃’(实际上是容器重新启动的主要进程),我不知道为什么。/var/log/docker包含这些日志(在事件发生的确切时刻):
time="2018-12-07T00:48:46Z" level=info msg="shim reaped" id=0af18fa159c07b167a29012b34c6c925c877f98d9a09dcd67078aa6c12f4ef2f
time="2018-12-07T00:48:46.052832134Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
time="2018-12-07T00:48:46Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/0af18fa159c07b167a29012b34c6c925c877f98d9a09dcd67078aa6c12f4ef2f/shim.sock" debug=false pid=9192CPU和RAM在那一刻似乎没问题。有人有线索吗?
编辑还有其他日志,但我怀疑结果是:
/var/log/nginx/error.log:
2018/12/07 00:48:45 [error] 4268#0: *10397 recv() failed (104: Connection reset by peer) while proxying upgraded connection, client: 172.31.43.209, server: , request: "GET /stream?s=000 HTTP/1.1", upstream: "http://172.17.0.2:80/stream?s=000", host: "..."
2018/12/07 00:48:45 [error] 4268#0: *1009 recv() failed (104: Connection reset by peer) while proxying upgraded connection, client: 172.31.43.209, server: , request: "GET /stream?s=000 HTTP/1.1", upstream: "http://172.17.0.2:80/stream?s=000", host: "..."
2018/12/07 00:48:46 [error] 4267#0: *11092 connect() failed (111: Connection refused) while connecting to upstream, client: 172.31.12.149, server: , request: "GET /stream?s=000 HTTP/1.1", upstream: "http://172.17.0.2:80/stream?s=000", host: "..."/var/log/docker-vents.log
2018-12-07T00:48:46.052880449Z container die 0af18fa159c07b167a29012b34c6c925c877f98d9a09dcd67078aa6c12f4ef2f (exitCode=1, image=2fc4abcada2b, name=inspiring_euler)
2018-12-07T00:48:46.176330610Z network disconnect 94c449d445a5a434af70517a1c8734c540c5c1f9ddbbc1a53a002f25dbc7f581 (container=0af18fa159c07b167a29012b34c6c925c877f98d9a09dcd67078aa6c12f4ef2f, name=bridge, type=bridge)
2018-12-07T00:48:46.626514590Z network connect 94c449d445a5a434af70517a1c8734c540c5c1f9ddbbc1a53a002f25dbc7f581 (container=0af18fa159c07b167a29012b34c6c925c877f98d9a09dcd67078aa6c12f4ef2f, name=bridge, type=bridge)
2018-12-07T00:48:46.869988171Z container start 0af18fa159c07b167a29012b34c6c925c877f98d9a09dcd67078aa6c12f4ef2f (image=2fc4abcada2b, name=inspiring_euler)发布于 2021-08-07 17:34:13
此故障可能是由于在启用了THP (透明的巨型页面)的系统上运行的容器。内存管理方案与导致失败的容器的内存分配模式不一致。https://github.com/containerd/containerd/issues/2202上也有类似的报道。
不幸的是,您不能调优弹性Bean秸秆主机的内核设置来解决这个问题。由于mongodb与THP有类似的问题,因此为mongodb编写了解决方案。
https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/
https://stackoverflow.com/questions/53671452
复制相似问题