我正在尝试在Openstack上建立一个Elasticsearch集群。我已经有了两个Openstack实例,每个实例都运行ES,并且这两个实例能够相互ping和curl eachothers实例。但是,无论我如何配置Elasticsearch.yml文件,似乎都无法使它们形成一个集群。
我在这两个实例上都使用了Elasticsearch 7.3.2。我在下面的配置中使用了非浮动in。
实例1- Elasticsearch.yml
cluster.name: my-cluster
node.name: node-1
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: [_local_,_site_]
http.port: 9200
discovery.seed_hosts: ["<INSTANCE1-IP>:9300", "<INSTANCE2-IP>:9300"]
cluster.initial_master_nodes: ["<INSTANCE2-IP>:9300"]实例2- Elasticsearch.yml
cluster.name: my-cluster
node.name: node-2
node.master: false
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: [_local_,_site_]
http.port: 9200
discovery.seed_hosts: ["<INSTANCE1-IP>:9300", "<INSTANCE2-IP>:9300"]
cluster.initial_master_nodes: ["<INSTANCE2-IP>:9300"]使用这些配置,主节点(Instance1)可以正常加载,但是在检查第二个节点(Instance2)的运行状况时,我得到了一个master_not_discovered_exception (503)。有什么想法吗?
检查node-2上的日志将显示以下信息:
[2019-10-01T09:45:53,126][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [node-2] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2019-10-01T09:45:53,127][WARN ][r.suppressed ] [node-2] path: /_cluster/health, params: {pretty=}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$3.onTimeout(TransportMasterNodeAction.java:251) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:325) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:572) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) [elasticsearch-7.3.2.jar:7.3.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
[2019-10-01T09:45:59,754][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node-2] master not discovered yet: have discovered [{node-2}{Sb2fOZEKR42_4sB2XnmShg}{LgJ0DLojSay7KV2_cXgdpw}{<INSTANCE2-IP>}{<INSTANCE2-IP>:9300}{di}{ml.machine_memory=33728778240, xpack.installed=true, ml.max_open_jobs=20}, {node-1}{HqONwd3fQHSWZxxEtctcog}{cu5KW146S8-04oBBCqL3QA}{<INSTANCE1-IP>}{<INSTANCE1-IP>:9300}{dim}{ml.machine_memory=33728778240, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [<INSTANCE2-IP>:9300] from hosts providers and [] from last-known cluster state; node term 0, last-accepted version 0 in term 0发布于 2019-10-01 17:13:18
多亏了这个thread,我设法解决了这个问题。它混合了更改一些配置和通过删除节点数据来执行完全ES重启。
我所做的步骤:
1)访问日志,识别错误:
sudo -i
cd /var/log/elasticsearch
cat my-cluster.log[2019-10-01T09:45:53,126][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [node-2] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2019-10-01T09:45:53,127][WARN ][r.suppressed ] [node-2] path: /_cluster/health, params: {pretty=}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$3.onTimeout(TransportMasterNodeAction.java:251) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:325) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:572) [elasticsearch-7.3.2.jar:7.3.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) [elasticsearch-7.3.2.jar:7.3.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
[2019-10-01T09:45:59,754][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node-2] master not discovered yet: have discovered [{node-2}{Sb2fOZEKR42_4sB2XnmShg}{LgJ0DLojSay7KV2_cXgdpw}{<INSTANCE2-IP>}{<INSTANCE2-IP>:9300}{di}{ml.machine_memory=33728778240, xpack.installed=true, ml.max_open_jobs=20}, {node-1}{HqONwd3fQHSWZxxEtctcog}{cu5KW146S8-04oBBCqL3QA}{<INSTANCE1-IP>}{<INSTANCE1-IP>:9300}{dim}{ml.machine_memory=33728778240, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [<INSTANCE2-IP>:9300] from hosts providers and [] from last-known cluster state; node term 0, last-accepted version 0 in term 02)将cluster.initial_master_nodes改为使用节点名而不是节点IP:
cluster.initial_master_nodes: ["node-1"]3)删除已有节点数据,在两个节点上重启Elasticsearch
sudo rm -rf /var/lib/elasticsearch
sudo service elasticsearch restarthttps://stackoverflow.com/questions/58169598
复制相似问题