我正在用码头设置Elasticsearch 3节点集群。这是我的码头撰写文件:
version: '2.0'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.3.0
environment:
- cluster.name=test-cluster
- node.name=elastic_1
- ES_JAVA_OPTS=-Xms512m -Xmx512m
- bootstrap.memory_lock=true
- discovery.zen.minimum_master_nodes=2
- discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- test_es_cluster_data:/usr/share/elasticsearch/data
networks:
- esnet
elasticsearch2:
extends:
file: ./docker-compose.yml
service: elasticsearch
environment:
- node.name=elastic_2
volumes:
- test_es_cluster2_data:/usr/share/elasticsearch/data
elasticsearch3:
extends:
file: ./docker-compose.yml
service: elasticsearch
environment:
- node.name=elastic_3
volumes:
- test_es_cluster3_data:/usr/share/elasticsearch/data
volumes:
test_es_cluster_data:
test_es_cluster2_data:
test_es_cluster3_data:
networks:
esnet:一旦集群运行完毕,我就会杀死主(elastic_1)来测试故障转移。我预计将选出新的主服务器,而集群应该始终响应读取请求。
主服务器是被选出来的,但是集群很长一段时间没有响应(~45s)。
在主机停止后,请从elastic_2和elastic_3找到日志(停靠停止escluster_elasticsearch_1):
elastic_2:
...
[2018-07-04T14:47:04,495][INFO ][o.e.d.z.ZenDiscovery ] [elastic_2] master_left [{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}], reason [shut_down]
...
[2018-07-04T14:47:04,509][WARN ][o.e.c.NodeConnectionsService] [elastic_2] failed to connect to node {elastic_1}{...}{172.24.0.3}{172.24.0.3:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [elastic_1][172.24.0.3:9300] connect_exception
...
[2018-07-04T14:47:07,565][INFO ][o.e.c.s.ClusterApplierService] [elastic_2] detected_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4]])
[2018-07-04T14:47:35,301][WARN ][r.suppressed ] path: /_cat/health, params: {}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
...
[2018-07-04T14:47:53,933][WARN ][o.e.c.s.ClusterApplierService] [elastic_2] cluster state applier task [apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4]])] took [46.3s] above the warn threshold of 30s
[2018-07-04T14:47:53,934][INFO ][o.e.c.s.ClusterApplierService] [elastic_2] removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [5]])
[2018-07-04T14:47:56,931][WARN ][o.e.t.TransportService ] [elastic_2] Received response for a request that has timed out, sent [48367ms] ago, timed out [18366ms] ago, action [internal:discovery/zen/fd/master_ping], node [{elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}], id [1035]elastic_3:
[2018-07-04T14:47:04,494][INFO ][o.e.d.z.ZenDiscovery ] [elastic_3] master_left [{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}], reason [shut_down]
...
[2018-07-04T14:47:04,519][WARN ][o.e.c.NodeConnectionsService] [elastic_3] failed to connect to node {elastic_1}{...}{172.24.0.3}{172.24.0.3:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [elastic_1][172.24.0.3:9300] connect_exception
...
[2018-07-04T14:47:07,550][INFO ][o.e.c.s.MasterService ] [elastic_3] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}
[2018-07-04T14:47:35,026][WARN ][r.suppressed ] path: /_cat/nodes, params: {v=}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
...
[2018-07-04T14:47:37,560][WARN ][o.e.d.z.PublishClusterStateAction] [elastic_3] timed out waiting for all nodes to process published state [4] (timeout [30s], pending nodes: [{elastic_2}{...}{172.24.0.2}{172.24.0.2:9300}])
[2018-07-04T14:47:37,561][INFO ][o.e.c.s.ClusterApplierService] [elastic_3] new_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4] source [zen-disco-elected-as-master ([1] nodes joined)[, ]]])
[2018-07-04T14:47:41,021][WARN ][o.e.c.s.MasterService ] [elastic_3] cluster state update task [zen-disco-elected-as-master ([1] nodes joined)[, ]] took [33.4s] above the warn threshold of 30s
[2018-07-04T14:47:41,022][INFO ][o.e.c.s.MasterService ] [elastic_3] zen-disco-node-failed({elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}), reason(transport disconnected), reason: removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}
[2018-07-04T14:47:56,929][INFO ][o.e.c.s.ClusterApplierService] [elastic_3] removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [5] source [zen-disco-node-failed({elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}), reason(transport disconnected)]])为什么集群需要这么长时间来稳定和响应请求?
令人费解的是:
( a)选举新主人(elastic_3):
[2018-07-04T14:47:07,550][INFO ] ... [elastic_3] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {elastic_3}...( b)然后,elastic_2检测到:
[2018-07-04T14:47:07,565][INFO ] ... [elastic_2] detected_master {elastic_3}...然后,等待处理已发布状态的主时间:
[2018-07-04T14:47:37,560][WARN ] ... [elastic_3] timed out waiting for all nodes to process published state [4] (timeout [30s], pending nodes: [{elastic_2}...])( d) elastic_2应用集群状态并发出警告:
[2018-07-04T14:47:53,933][WARN ] ... [elastic_2] cluster state applier task [apply cluster state (from master [master {elastic_3}...])] took [46.3s] above the warn threshold of 30s什么能导致超时(c)?所有这些都是在本地机器上运行的(没有网络问题)。我错过任何配置了吗?
同时,同时请求elastic_2和elastic_3 one的结果是MasterNotDiscoveredException。根据文档,预期集群将响应(https://www.elastic.co/guide/en/elasticsearch/reference/6.3/modules-discovery-zen.html#no-master-block)。
有人经历过吗?我希望就这个问题提出任何意见。
发布于 2018-07-10 08:40:10
https://stackoverflow.com/questions/51186500
复制相似问题