首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >galera节点上的完整SST没有启动("WSREP:准备好的SST请求“缺失)

galera节点上的完整SST没有启动("WSREP:准备好的SST请求“缺失)
EN

Stack Overflow用户
提问于 2017-06-30 13:03:28
回答 1查看 1.9K关注 0票数 0

我有一个galera集群(10.0.27),它有3个节点,每个节点位于一个专用服务器上。在重新启动其中一个服务器后,节点不能再加入集群,也不能执行完整的SST操作。实际上,启动一些命令就像mysql“错过”一样。

我有第二个“开发”集群,配置完全相同,添加一个节点没有问题。当我为一个完整的SST添加一个节点时,我注意到工作集群和不工作之间的区别:

加入工作集群的节点:

代码语言:javascript
复制
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: Quorum results: 
 11:44:52  mysqld: #011version    = 4, 
 11:44:52  mysqld: #011component  = PRIMARY, 
 11:44:52  mysqld: #011conf_id    = 8, 
 11:44:52  mysqld: #011members    = 2/3 (joined/total), 
 11:44:52  mysqld: #011act_id     = 906976, 
 11:44:52  mysqld: #011last_appl. = -1, 
 11:44:52  mysqld: #011protocols  = 0/7/3 (gcs/repl/appl), 
 11:44:52  mysqld: #011group UUID = 27ba4c4f-9b78-11e6-824c-f3b1e60fa202 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: Flow-control interval: [28, 28] 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 906976) 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: State transfer required: 
 11:44:52  mysqld: #011Group state: 27ba4c4f-9b78-11e6-824c-f3b1e60fa202:906976 
 11:44:52  mysqld: #011Local state: 00000000-0000-0000-0000-000000000000:-1 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: New cluster view: global state: 27ba4c4f-9b78-11e6-824c-f3b1e60fa202:906976, view# 9: Primary, number of nodes: 3, my index: 2, protocol version 3 
 11:44:52  mysqld: 170628 11:44:52 [Warning] WSREP: Gap in state sequence. Need state transfer. 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.***.***.**2' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '16472' --binlog '/var/log/mysql/mariadb-bin' ' 
 **11:44:52  rsyncd[16514]: rsyncd version 3.1.1 starting, listening on port 4444** 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: Prepared SST request: rsync|192.***.***.**2:4444/rsync_sst 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: REPL Protocols: 7 (3, 2) 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: Assign initial position for certification: 906976, protocol version: 3 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: Service thread queue flushed. 
 11:44:52  mysqld: 170628 11:44:52 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (27ba4c4f-9b78-11e6-824c-f3b1e60fa202): 1 (Operation not permitted) 
 11:44:52  mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable. 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: Member 2.0 (server-3) requested state transfer from '*any*'. Selected 0.0 (server1)(SYNCED) as donor. 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 906977) 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: Requesting state transfer: success, donor: 0 
 11:44:52  mysqld: 170628 11:44:52 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(27ba4c4f-9b78-11e6-824c-f3b1e60fa202:906976) 
 11:44:52  rsyncd[16531]: name lookup failed for 192.***.***.**1: Name or service not known 
 11:44:52  rsyncd[16531]: connect from UNKNOWN (192.***.***.**1) 
 11:44:52  rsyncd[16531]: rsync to rsync_sst/ from UNKNOWN (192.***.***.**1) 
 11:44:52  rsyncd[16531]: receiving file list 
 11:44:54  rsyncd[16553]: name lookup failed for 192.***.***.**1: Name or service not known 
 11:44:54  rsyncd[16553]: connect from UNKNOWN (192.***.***.**1) 
 11:44:54  rsyncd[16531]: sent 114 bytes  received 146847600 bytes  total size 146810880 
 11:44:54  rsyncd[16553]: rsync to rsync_sst-log_dir/ from UNKNOWN (192.***.***.**1) 
 11:44:54  rsyncd[16553]: receiving file list 
 11:44:54  rsyncd[16553]: sent 63 bytes  received 100688095 bytes  total size 100663296 
 11:44:54  rsyncd[16559]: name lookup failed for 192.***.***.**1: Name or service not known 
 11:44:54  rsyncd[16559]: connect from UNKNOWN (192.***.***.**1) 
 11:44:54  rsyncd[16560]: name lookup failed for 192.***.***.**1: Name or service not known 
 11:44:54  rsyncd[16560]: connect from UNKNOWN (192.***.***.**1) 
 11:44:54  rsyncd[16561]: name lookup failed for 192.***.***.**1: Name or service not known 
 11:44:54  rsyncd[16561]: connect from UNKNOWN (192.***.***.**1) 
 11:44:54  rsyncd[16562]: name lookup failed for 192.***.***.**1: Name or service not known 
 11:44:54  rsyncd[16562]: connect from UNKNOWN (192.***.***.**1) 
 11:44:54  rsyncd[16559]: rsync to rsync_sst/./db_1 from UNKNOWN (192.***.***.**1) 
 11:44:54  rsyncd[16562]: rsync to rsync_sst/./db_2 from UNKNOWN (192.***.***.**1) 
 11:44:54  rsyncd[16560]: rsync to rsync_sst/./db_3 from UNKNOWN (192.***.***.**1) 
 11:44:54  rsyncd[16561]: rsync to rsync_sst/./db_3 from UNKNOWN (192.***.***.**1) 
 11:44:54  rsyncd[16560]: receiving file list
...

非工作集群上的节点连接:

代码语言:javascript
复制
 13:36:28  mysqld: 170630 13:36:28 [Note] WSREP: Quorum results: 
 13:36:28   mysqld: #011version    = 4, 
 13:36:28   mysqld: #011component  = PRIMARY, 
 13:36:28   mysqld: #011conf_id    = 514, 
 13:36:28   mysqld: #011members    = 2/3 (joined/total), 
 13:36:28   mysqld: #011act_id     = 242914778, 
 13:36:28   mysqld: #011last_appl. = -1, 
 13:36:28   mysqld: #011protocols  = 0/7/3 (gcs/repl/appl), 
 13:36:28   mysqld: #011group UUID = 8119e584-9f83-11e6-b292-7a8102156c2d 
 13:36:28   mysqld: 170630 13:36:28 [Note] WSREP: Flow-control interval: [28, 28] 
 13:36:28   mysqld: 170630 13:36:28 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 242914778) 
 13:36:28   mysqld: 170630 13:36:28 [Note] WSREP: State transfer required: 
 13:36:28   mysqld: #011Group state: 8119e584-9f83-11e6-b292-7a8102156c2d:242914778 
 13:36:28   mysqld: #011Local state: 00000000-0000-0000-0000-000000000000:-1 
 13:36:28   mysqld: 170630 13:36:28 [Note] WSREP: New cluster view: global state: 8119e584-9f83-1
1e6-b292-7a8102156c2d:242914778, view# 515: Primary, number of nodes: 3, my index: 2, protocol version 3 
 13:36:28   mysqld: 170630 13:36:28 [Warning] WSREP: Gap in state sequence. Need state transfer. 
 13:36:28   mysqld: 170630 13:36:28 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --add
ress '192.***.***.*11' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --pare
nt '13253' --binlog '/var/log/mysql/mariadb-bin' ' 
 13:36:28   rsyncd[13316]: rsyncd version 3.1.1 starting, listening on port 4444 
 13:36:32   mysqld: 170630 13:36:32 [Note] WSREP: (85c5aae8, 'tcp://0.0.0.0:4567') turning messag
e relay requesting off 
 13:36:56   /etc/init.d/mysql[14935]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=
/etc/mysql/debian.cnf ping' resulted in 

区别就在行后:

代码语言:javascript
复制
rsyncd[13316]: rsyncd version 3.1.1 starting, listening on port 4444 

在工作集群上,下面一行是

代码语言:javascript
复制
WSREP: Prepared SST request: rsync|192.***.***.**2:4444/rsync_sst 

在不工作的集群上,这一行不会出现,就像没有发出SST请求一样。

如果您认为配置可以帮助查找问题,我可以提供更多有关配置的信息。

谢谢你的帮助!

EN

回答 1

Stack Overflow用户

发布于 2017-08-04 15:49:35

同样的问题,这就是我发现的:

wsrep_sst_rsync被困在一个无穷无尽的循环中。在我的例子中,因为lsof -i :$rsync_port的输出是空的。由于一些(未知的)原因,lsof设置了setgid位:

代码语言:javascript
复制
[dbserver1:~]# ls -l /usr/bin/lsof
-rwxr-sr-x 1 root root 163224 Oct 28  2015 /usr/bin/lsof

这导致了wsrep_sst_rsync的没完没了的循环,因为它检查是否可以启动rsync。删除标志将导致脚本继续,最终启动SST。

可以使用以下方法移除该标志:

代码语言:javascript
复制
[dbserver1:~]# chmod g-s /usr/bin/lsof
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/44847182

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档