首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >arangodb集群重启失败

arangodb集群重启失败
EN

Stack Overflow用户
提问于 2018-03-27 16:28:13
回答 1查看 426关注 0票数 2

我们在5台服务器上设置了一个包含3个代理、5个协调器和5个数据库服务器的arangodb集群。

环境: Centos 6

我们的经验是,如果它超过了其中一台服务器上的最大内存,集群将完全失败。为了避免这种情况,由于我们没有找到控制内存使用的方法,我们使用top |grep arangod命令定期观察每个节点,如果消耗太多,则重新启动节点。它通常工作得很好。但是,当我们尝试重新启动一个节点时,我们收到了如下日志:

代码语言:javascript
复制
    2018/03/27 15:47:31 Failed to get master URL, retrying in 5sec (All 3 servers responded with temporary failure)
    2018/03/27 15:47:31 ## Start of dbserver log
        2018-03-27T07:46:31Z [37755] WARNING {memory} It is recommended to set NUMA to interleaved.
        2018-03-27T07:46:31Z [37755] WARNING {memory} put 'numactl --interleave=all' in front of your command
        2018-03-27T07:46:31Z [37755] INFO using storage engine rocksdb
        2018-03-27T07:46:31Z [37755] INFO {cluster} Starting up with role PRIMARY
        2018-03-27T07:46:41Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.30:8531. Unsuccessful consecutive tries: 21 (9.84s). Network checks advised.
        2018-03-27T07:46:42Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.28:8531. Unsuccessful consecutive tries: 22 (10.82s). Network checks advised.
        2018-03-27T07:46:43Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.29:8531. Unsuccessful consecutive tries: 23 (11.89s). Network checks advised.
        2018-03-27T07:46:44Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.30:8531. Unsuccessful consecutive tries: 24 (13.03s). Network checks advised.
        2018-03-27T07:46:46Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.28:8531. Unsuccessful consecutive tries: 25 (14.25s). Network checks advised.
        2018-03-27T07:46:47Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.29:8531. Unsuccessful consecutive tries: 26 (15.57s). Network checks advised.
        2018-03-27T07:46:48Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.30:8531. Unsuccessful consecutive tries: 27 (16.99s). Network checks advised.
        2018-03-27T07:46:50Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.28:8531. Unsuccessful consecutive tries: 28 (18.51s). Network checks advised.
        2018-03-27T07:46:51Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.29:8531. Unsuccessful consecutive tries: 29 (20.15s). Network checks advised.
        2018-03-27T07:46:53Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.30:8531. Unsuccessful consecutive tries: 30 (21.9s). Network checks advised.
        2018-03-27T07:46:55Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.28:8531. Unsuccessful consecutive tries: 31 (23.8s). Network checks advised.
        2018-03-27T07:46:57Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.29:8531. Unsuccessful consecutive tries: 32 (25.83s). Network checks advised.
        2018-03-27T07:46:59Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.30:8531. Unsuccessful consecutive tries: 33 (28.01s). Network checks advised.
        2018-03-27T07:47:02Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.28:8531. Unsuccessful consecutive tries: 34 (30.36s). Network checks advised.
        2018-03-27T07:47:04Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.29:8531. Unsuccessful consecutive tries: 35 (32.89s). Network checks advised.
        2018-03-27T07:47:04Z [37755] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.28:8531. Unsuccessful consecutive tries: 36 (32.89s). Network checks advised.
2018/03/27 15:47:31 ## End of dbserver log
2018/03/27 15:47:32 ## Start of coordinator log
        2018-03-27T07:46:32Z [37769] WARNING {memory} It is recommended to set NUMA to interleaved.
        2018-03-27T07:46:32Z [37769] WARNING {memory} put 'numactl --interleave=all' in front of your command
        2018-03-27T07:46:32Z [37769] INFO using storage engine rocksdb
        2018-03-27T07:46:32Z [37769] INFO {cluster} Starting up with role COORDINATOR
        2018-03-27T07:46:42Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.30:8531. Unsuccessful consecutive tries: 21 (9.84s). Network checks advised.
        2018-03-27T07:46:43Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.28:8531. Unsuccessful consecutive tries: 22 (10.82s). Network checks advised.
        2018-03-27T07:46:44Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.29:8531. Unsuccessful consecutive tries: 23 (11.89s). Network checks advised.
        2018-03-27T07:46:45Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.30:8531. Unsuccessful consecutive tries: 24 (13.03s). Network checks advised.
        2018-03-27T07:46:47Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.28:8531. Unsuccessful consecutive tries: 25 (14.25s). Network checks advised.
        2018-03-27T07:46:48Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.29:8531. Unsuccessful consecutive tries: 26 (15.57s). Network checks advised.
        2018-03-27T07:46:49Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.30:8531. Unsuccessful consecutive tries: 27 (16.99s). Network checks advised.
        2018-03-27T07:46:51Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.28:8531. Unsuccessful consecutive tries: 28 (18.51s). Network checks advised.
        2018-03-27T07:46:52Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.29:8531. Unsuccessful consecutive tries: 29 (20.14s). Network checks advised.
        2018-03-27T07:46:54Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.30:8531. Unsuccessful consecutive tries: 30 (21.9s). Network checks advised.
        2018-03-27T07:46:56Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.28:8531. Unsuccessful consecutive tries: 31 (23.8s). Network checks advised.
        2018-03-27T07:46:58Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.29:8531. Unsuccessful consecutive tries: 32 (25.83s). Network checks advised.
        2018-03-27T07:47:00Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.30:8531. Unsuccessful consecutive tries: 33 (28.01s). Network checks advised.
        2018-03-27T07:47:03Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.28:8531. Unsuccessful consecutive tries: 34 (30.36s). Network checks advised.
        2018-03-27T07:47:05Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.29:8531. Unsuccessful consecutive tries: 35 (32.89s). Network checks advised.
        2018-03-27T07:47:05Z [37769] INFO {agencycomm} Flaky agency communication to http+tcp://65.18.27.28:8531. Unsuccessful consecutive tries: 36 (32.89s). Network checks advised.
2018/03/27 15:47:32 ## End of coordinator log
2018/03/27 15:47:46 Failed to get master URL, retrying in 5sec (All 3 servers responded with temporary failure)

所有的服务器都能很好地相互连接,所以这不是网络的问题。

就在我写这个问题并收集日志信息时,集群成功重启。这有点奇怪。现在,2个节点将日志打印为

代码语言:javascript
复制
updated cluster config does not contain myself. rejecting

现在显示集合需要很长时间,并且集群不能正常工作。有人知道为什么吗?

EN

回答 1

Stack Overflow用户

发布于 2020-09-03 17:36:57

引用github的讨论

请注意,命令--cluster.agent-size 5只能在第一次启动集群时使用。这是由于启动器在第一次启动时写入了不能再更改的集群配置。

因此,在您的示例中,如果需要在其他节点中添加更多代理,则必须在每个新节点上使用--cluster.start-agent true如果您希望确保在关闭两个(随机)节点时5个节点的集群处于运行状态,则代理大小=5就是所需的

如果代理未启动并运行,群集将无法工作。该机构使用RAFT协议。如果你的代理是由3个代理组成的,那么如果有两个代理宕机了,那么这个代理就是宕机了(你的集群也是如此)。如果你的代理是由5个代理组成的,那么如果两个代理宕机了,代理将会存活下来(对于你的集群也是如此)

如果你想在3台机器停机的情况下存活,那么其他的设置也是可能的

您还可以考虑对代理使用单独的计算机,例如:

用于机构

  • 3台专用计算机加上用于DBServers+Coordinators的另外3台计算机(共6台计算机),复制因子= 3

上面的设置将保留到1个代理宕机和2个DBServers宕机(因此总共3台机器宕机)

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/49507915

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档