你好,我的galera星系团目前有问题。它早些时候运行良好,但突然所有节点都崩溃了。我在云中有3个本地节点和1个节点。
在下降过程中,日志显示了以下内容:
2021-09-23 8:24:24 1 [Note] WSREP: ================================================
View:
id: d326832d-56e2-11eb-80c1-760504343273:9842007
status: non-primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: no
own_index: 0
members(3):
0: 077cec27-1b75-11ec-842c-5f218e28b692, Strike
1: 25c6e10d-1b75-11ec-9ff7-de60adb87197, unspecified
2: 4e87b761-1b96-11ec-b561-272e3101cb38, Duel
=================================================
2021-09-23 8:24:24 1 [Note] WSREP: Non-primary view
2021-09-23 8:24:24 1 [Note] WSREP: Server status change connected -> connected
2021-09-23 8:24:24 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-09-23 8:24:24 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-09-23 8:24:25 0 [Note] WSREP: (077cec27-842c, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.10.10.10:4567 timed out, no messages seen in PT3S, socket stats: rtt: 1166000 rttvar: 583000 rto: 3498000 lost: 0 last_data_recv: 1835 cwnd: 10 last_queued_since: 1835197309 last_delivered_since: 1835197309 send_queue_length: 0 send_queue_bytes: 0
**This happened for ~90 retries**
2021-09-23 8:29:42 0 [Note] /usr/sbin/mariadbd (initiated by: unknown): Normal shutdown
2021-09-23 8:29:42 0 [Note] WSREP: Shutdown replication
2021-09-23 8:29:42 0 [Note] WSREP: Server status change connected -> disconnecting
2021-09-23 8:29:42 0 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-09-23 8:29:42 0 [Note] WSREP: Closing send monitor...
2021-09-23 8:29:42 0 [Note] WSREP: Closed send monitor.
2021-09-23 8:29:42 0 [Note] WSREP: gcomm: terminating thread
2021-09-23 8:29:42 0 [Note] WSREP: gcomm: joining thread
2021-09-23 8:29:42 0 [Note] WSREP: gcomm: closing backend
2021-09-23 8:29:42 0 [Note] WSREP: view(view_id(NON_PRIM,077cec27-842c,167) memb {
077cec27-842c,0
} joined {
} left {
} partitioned {
25c6e10d-9ff7,0
4e87b761-b561,0
ed270aba-aedd,0
})
2021-09-23 8:29:42 0 [Note] WSREP: PC protocol downgrade 1 -> 0
2021-09-23 8:29:42 0 [Note] WSREP: view((empty))
2021-09-23 8:29:42 0 [Note] WSREP: Deferred close timer started for socket with remote endpoint: tcp://10.10.10.20:50904
2021-09-23 8:29:42 0 [Note] WSREP: gcomm: closed
2021-09-23 8:29:42 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2021-09-23 8:29:42 0 [Note] WSREP: Flow-control interval: [16, 16]
2021-09-23 8:29:42 0 [Note] WSREP: Received NON-PRIMARY.
2021-09-23 8:29:42 0 [Note] WSREP: New SELF-LEAVE.
2021-09-23 8:29:42 0 [Note] WSREP: Flow-control interval: [0, 0]
2021-09-23 8:29:42 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2021-09-23 8:29:42 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 9842007)
2021-09-23 8:29:42 0 [Note] WSREP: RECV thread exiting 0: Success
2021-09-23 8:29:42 6 [Note] WSREP: ================================================
View:
id: d326832d-56e2-11eb-80c1-760504343273:9842007
status: non-primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: no
own_index: 0
members(1):
0: 077cec27-1b75-11ec-842c-5f218e28b692, Strike
=================================================
2021-09-23 8:29:42 6 [Note] WSREP: Non-primary view
2021-09-23 8:29:42 6 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-09-23 8:29:42 0 [Note] WSREP: recv_thread() joined.
2021-09-23 8:29:42 0 [Note] WSREP: Closing replication queue.
2021-09-23 8:29:42 0 [Note] WSREP: Closing slave action queue.
2021-09-23 8:29:42 6 [Note] WSREP: ================================================
View:
id: d326832d-56e2-11eb-80c1-760504343273:9842007
status: non-primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: yes
own_index: -1
members(0):
=================================================同样发生在我的第二个节点上:
2021-09-23 8:31:01 7 [Note] WSREP: ================================================
View:
id: d326832d-56e2-11eb-80c1-760504343273:9842007
status: non-primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: no
own_index: 0
members(1):
0: 25c6e10d-1b75-11ec-9ff7-de60adb87197, Aegis
=================================================
2021-09-23 8:31:01 7 [Note] WSREP: Non-primary view
2021-09-23 8:31:01 7 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-09-23 8:31:01 0 [Note] WSREP: recv_thread() joined.
2021-09-23 8:31:01 0 [Note] WSREP: Closing replication queue.
2021-09-23 8:31:01 0 [Note] WSREP: Closing slave action queue.
2021-09-23 8:31:01 7 [Note] WSREP: ================================================
View:
id: d326832d-56e2-11eb-80c1-760504343273:9842007
status: non-primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: yes
own_index: -1
members(0):
=================================================第三节点:
2021-09-23 8:30:59 2 [Note] WSREP: ================================================
View:
id: d326832d-56e2-11eb-80c1-760504343273:9842007
status: non-primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: no
own_index: 0
members(1):
0: 4e87b761-1b96-11ec-b561-272e3101cb38, Duel
=================================================
2021-09-23 8:30:59 2 [Note] WSREP: Non-primary view
2021-09-23 8:30:59 2 [Note] WSREP: Server status change connected -> connected
2021-09-23 8:30:59 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-09-23 8:30:59 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-09-23 8:31:03 0 [Note] WSREP: (4e87b761-b561, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.10.10.10:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 385031380 cwnd: 1 last_queued_since: 385331379302597 last_delivered_since: 385331379302597 send_queue_length: 0 send_queue_bytes: 0
2021-09-23 8:31:04 0 [Note] WSREP: (4e87b761-b561, 'tcp://0.0.0.0:4567') reconnecting to ed270aba-aedd (tcp://10.10.10.10:4567), attempt 90
2021-09-23 8:31:05 0 [Note] WSREP: cleaning up 25c6e10d-9ff7 (tcp://10.10.10.20:4567)
2021-09-23 8:31:08 0 [Note] WSREP: (4e87b761-b561, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.10.10.41:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 385036380 cwnd: 1 last_queued_since: 385336379962010 last_delivered_since: 385336379962010 send_queue_length: 0 send_queue_bytes: 0
2021-09-23 8:31:49 2 [Note] WSREP: ================================================
View:
id: d326832d-56e2-11eb-80c1-760504343273:9842007
status: non-primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: yes
own_index: -1
members(0):
=================================================至于我的云节点,发生了这样的事情:
2021-09-23 8:29:07 0 [Note] WSREP: (ed270aba-aedd, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.10.10.30:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 684592357 cwnd: 1 last_queued_since: 684892356898817 last_delivered_since: 684892356898817 send_queue_length: 0 send_queue_bytes: 0
2021-09-23 8:29:07 0 [Note] WSREP: (ed270aba-aedd, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.10.10.40:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 684592358 cwnd: 1 last_queued_since: 684892357027577 last_delivered_since: 684892357027577 send_queue_length: 0 send_queue_bytes: 0
2021-09-23 8:29:09 0 [Note] WSREP: (ed270aba-aedd, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.10.10.20:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 684593857 cwnd: 1 last_queued_since: 684893856941066 last_delivered_since: 684893856941066 send_queue_length: 0 send_queue_bytes: 0
2021-09-23 8:32:28 0 [Note] /usr/sbin/mariadbd (initiated by: unknown): Normal shutdown
2021-09-23 8:32:28 0 [Note] WSREP: Shutdown replication
2021-09-23 8:32:28 0 [Note] WSREP: Server status change connected -> disconnecting
2021-09-23 8:32:28 0 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-09-23 8:32:28 0 [Note] WSREP: Closing send monitor...
2021-09-23 8:32:28 0 [Note] WSREP: Closed send monitor.
2021-09-23 8:32:28 0 [Note] WSREP: gcomm: terminating thread
2021-09-23 8:32:28 0 [Note] WSREP: gcomm: joining thread
2021-09-23 8:32:28 0 [Note] WSREP: gcomm: closing backend
2021-09-23 8:32:28 0 [Note] WSREP: PC protocol downgrade 1 -> 0
2021-09-23 8:32:28 0 [Note] WSREP: view((empty))
2021-09-23 8:32:28 0 [Note] WSREP: gcomm: closed
2021-09-23 8:32:28 0 [Note] WSREP: New SELF-LEAVE.
2021-09-23 8:32:28 0 [Note] WSREP: Flow-control interval: [0, 0]
2021-09-23 8:32:28 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2021-09-23 8:32:28 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 9842007)
2021-09-23 8:32:28 0 [Note] WSREP: RECV thread exiting 0: Success
2021-09-23 8:32:28 0 [Note] WSREP: recv_thread() joined.
2021-09-23 8:32:28 0 [Note] WSREP: Closing replication queue.
2021-09-23 8:32:28 0 [Note] WSREP: Closing slave action queue.
2021-09-23 8:32:28 2 [Note] WSREP: ================================================
View:
id: d326832d-56e2-11eb-80c1-760504343273:9842007
status: non-primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: yes
own_index: -1
members(0):
=================================================我有以下配置:
节点1:
[galera] # Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
#add your node ips here
wsrep_cluster_address="gcomm://strike,aegis,duel,clone"
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
#Cluster name
wsrep_cluster_name="cloud_test_cluster"
# Allow server to accept connections on all interfaces.
bind-address=0.0.0.0
# this server ip, change for each server
wsrep_node_address="strike"
# this server name, change for each server
wsrep_node_name="Strike"
wsrep_sst_method=rsync
wsrep_sst_donor="Aegis,Duel"节点2与上述相同,但:
wsrep_sst_donor="Strike,Duel"节点3:
wsrep_sst_donor="Strike,Aegis"最后,云:
wsrep_sst_donor="Duel,Aegis,Strike"所有这些几乎是同时发生的。他们刚刚失去联系了吗?与10.10.10失去联系是否导致了坠机?为什么成员之间的数量减少了?这是昨天和今天发生的。我从上个周末开始就做了这个,但是直到昨天还没有问题.
有人能向我解释一下发生了什么吗?
发布于 2021-09-24 00:10:57
您的Strike节点:
2021-09-23 8:29:42 0 [Note] /usr/sbin/mariadbd (initiated by: unknown): Normal shutdown所以看上去它只是发送了一个SIGTERM来关闭这个服务。
其他节点日志是在Strike关机之后。
注意,所有节点都有non-primary。如果在电源故障后启动,请参阅重新启动群集文档,需要手动指定其中一个节点作为新的主节点。
这涉及到确定最新的节点(单数),并仅在该节点上运行galera_recovery。然后启动其他节点。
也许Strike的终止是由于没有足够快地到达主状态的结果。
https://dba.stackexchange.com/questions/300022
复制相似问题