首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Elasticsearch:“超时等待所有节点处理已发布状态”和群集不可用性

Elasticsearch:“超时等待所有节点处理已发布状态”和群集不可用性
EN

Stack Overflow用户
提问于 2018-07-05 08:22:12
回答 1查看 1.5K关注 0票数 0

我正在用码头设置Elasticsearch 3节点集群。这是我的码头撰写文件:

代码语言:javascript
复制
version: '2.0'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.3.0
    environment:
      - cluster.name=test-cluster
      - node.name=elastic_1
      - ES_JAVA_OPTS=-Xms512m -Xmx512m
      - bootstrap.memory_lock=true
      - discovery.zen.minimum_master_nodes=2
      - discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - test_es_cluster_data:/usr/share/elasticsearch/data
    networks:
      - esnet
  elasticsearch2:
    extends:
      file: ./docker-compose.yml
      service: elasticsearch
    environment:
      - node.name=elastic_2
    volumes:
      - test_es_cluster2_data:/usr/share/elasticsearch/data
  elasticsearch3:
    extends:
      file: ./docker-compose.yml
      service: elasticsearch
    environment:
      - node.name=elastic_3
    volumes:
      - test_es_cluster3_data:/usr/share/elasticsearch/data
volumes:
  test_es_cluster_data:
  test_es_cluster2_data:
  test_es_cluster3_data:
networks:
  esnet:

一旦集群运行完毕,我就会杀死主(elastic_1)来测试故障转移。我预计将选出新的主服务器,而集群应该始终响应读取请求。

主服务器是被选出来的,但是集群很长一段时间没有响应(~45s)。

在主机停止后,请从elastic_2和elastic_3找到日志(停靠停止escluster_elasticsearch_1):

elastic_2:

代码语言:javascript
复制
...
[2018-07-04T14:47:04,495][INFO ][o.e.d.z.ZenDiscovery     ] [elastic_2] master_left [{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}], reason [shut_down]
...
[2018-07-04T14:47:04,509][WARN ][o.e.c.NodeConnectionsService] [elastic_2] failed to connect to node {elastic_1}{...}{172.24.0.3}{172.24.0.3:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [elastic_1][172.24.0.3:9300] connect_exception
    ...
[2018-07-04T14:47:07,565][INFO ][o.e.c.s.ClusterApplierService] [elastic_2] detected_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4]])
[2018-07-04T14:47:35,301][WARN ][r.suppressed             ] path: /_cat/health, params: {}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
    ...
[2018-07-04T14:47:53,933][WARN ][o.e.c.s.ClusterApplierService] [elastic_2] cluster state applier task [apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4]])] took [46.3s] above the warn threshold of 30s
[2018-07-04T14:47:53,934][INFO ][o.e.c.s.ClusterApplierService] [elastic_2] removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [5]])
[2018-07-04T14:47:56,931][WARN ][o.e.t.TransportService   ] [elastic_2] Received response for a request that has timed out, sent [48367ms] ago, timed out [18366ms] ago, action [internal:discovery/zen/fd/master_ping], node [{elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}], id [1035]

elastic_3:

代码语言:javascript
复制
[2018-07-04T14:47:04,494][INFO ][o.e.d.z.ZenDiscovery     ] [elastic_3] master_left [{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}], reason [shut_down]
...
[2018-07-04T14:47:04,519][WARN ][o.e.c.NodeConnectionsService] [elastic_3] failed to connect to node {elastic_1}{...}{172.24.0.3}{172.24.0.3:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [elastic_1][172.24.0.3:9300] connect_exception
    ...
[2018-07-04T14:47:07,550][INFO ][o.e.c.s.MasterService    ] [elastic_3] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}
[2018-07-04T14:47:35,026][WARN ][r.suppressed             ] path: /_cat/nodes, params: {v=}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
    ...
[2018-07-04T14:47:37,560][WARN ][o.e.d.z.PublishClusterStateAction] [elastic_3] timed out waiting for all nodes to process published state [4] (timeout [30s], pending nodes: [{elastic_2}{...}{172.24.0.2}{172.24.0.2:9300}])
[2018-07-04T14:47:37,561][INFO ][o.e.c.s.ClusterApplierService] [elastic_3] new_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4] source [zen-disco-elected-as-master ([1] nodes joined)[, ]]])
[2018-07-04T14:47:41,021][WARN ][o.e.c.s.MasterService    ] [elastic_3] cluster state update task [zen-disco-elected-as-master ([1] nodes joined)[, ]] took [33.4s] above the warn threshold of 30s
[2018-07-04T14:47:41,022][INFO ][o.e.c.s.MasterService    ] [elastic_3] zen-disco-node-failed({elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}), reason(transport disconnected), reason: removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}
[2018-07-04T14:47:56,929][INFO ][o.e.c.s.ClusterApplierService] [elastic_3] removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [5] source [zen-disco-node-failed({elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}), reason(transport disconnected)]])

为什么集群需要这么长时间来稳定和响应请求?

令人费解的是:

( a)选举新主人(elastic_3):

代码语言:javascript
复制
[2018-07-04T14:47:07,550][INFO ] ... [elastic_3] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {elastic_3}...

( b)然后,elastic_2检测到:

代码语言:javascript
复制
[2018-07-04T14:47:07,565][INFO ] ... [elastic_2] detected_master {elastic_3}...

然后,等待处理已发布状态的主时间:

代码语言:javascript
复制
[2018-07-04T14:47:37,560][WARN ] ... [elastic_3] timed out waiting for all nodes to process published state [4] (timeout [30s], pending nodes: [{elastic_2}...])

( d) elastic_2应用集群状态并发出警告:

代码语言:javascript
复制
[2018-07-04T14:47:53,933][WARN ] ... [elastic_2] cluster state applier task [apply cluster state (from master [master {elastic_3}...])] took [46.3s] above the warn threshold of 30s

什么能导致超时(c)?所有这些都是在本地机器上运行的(没有网络问题)。我错过任何配置了吗?

同时,同时请求elastic_2和elastic_3 one的结果是MasterNotDiscoveredException。根据文档,预期集群将响应(https://www.elastic.co/guide/en/elasticsearch/reference/6.3/modules-discovery-zen.html#no-master-block)。

有人经历过吗?我希望就这个问题提出任何意见。

EN

回答 1

Stack Overflow用户

发布于 2018-07-10 08:40:10

使用停靠重新启动而不是停靠停止解决了这个问题。请参阅:https://discuss.elastic.co/t/timed-out-waiting-for-all-nodes-to-process-published-state-and-cluster-unavailability/138590

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51186500

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档