首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >没有明显的原因,MariaDB复制每天延迟十几次。

没有明显的原因,MariaDB复制每天延迟十几次。
EN

Database Administration用户
提问于 2022-11-01 10:54:43
回答 2查看 354关注 0票数 1

我们首先使用功能较弱的服务器来建立MariaDB复制,但是考虑到我们增加的数据库负载,我们不得不租用更强大的服务器,这时问题就开始了。

现在,主从之间的复制有时会延迟在0到1000秒之间(它正在缓慢上升到1000秒,大约是两秒钟),原因我们还无法确定。这种情况平均每天发生十几次。考虑到我公司业务的性质,我们根本承受不起任何滞后。

主处理器有6核Intel CPU,128 on内存,1TB SSD在raid 1配置中,平均负载略高于2,在Ubuntu20.04LTS上运行MariaDB 10.4.2x。

在过去的大约18个小时里,日志中只有两个与复制相关的错误/警告:

代码语言:javascript
复制
Oct 21 12:00:35 master.com mysqld[808]: 2022-10-21 12:00:35 5677386 [Warning] Aborted connection 5677386 to db: 'unconnected' user: 'replication' host: '1.2.3.4' (Got an error writing communication packets)
Oct 21 12:03:46 master.com mysqld[808]: 2022-10-21 12:03:46 5780035 [Warning] Aborted connection 5780035 to db: 'unconnected' user: 'replication' host: '1.2.3.4' (Got an error writing communication packets)

从站具有完全相同的HW配置,负载平均值接近1。

代码语言:javascript
复制
MariaDB [(none)]> show slave status\G 
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: master.com
                   Master_User: replication
                   Master_Port: 3306
                 Connect_Retry: 5
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB: 
           Replicate_Ignore_DB: 
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 0
                    Last_Error: 
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 133718212
               Relay_Log_Space: 133718872
               Until_Condition: None
                Until_Log_File: 
                 Until_Log_Pos: 0
         Seconds_Behind_Master: 0
                 Last_IO_Errno: 0
                 Last_IO_Error: 
                Last_SQL_Errno: 0
                Last_SQL_Error: 
   Replicate_Ignore_Server_Ids: 
              Master_Server_Id: 105
                    Using_Gtid: Slave_Pos
                   Gtid_IO_Pos: 0-105-9982390796
       Replicate_Do_Domain_Ids: 
   Replicate_Ignore_Domain_Ids: 
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 2226622

服务器之间有1Gbit的连接,几乎是完美的。ping -f报告的丢失数据包不到0.002%:

代码语言:javascript
复制
root@master # ping -f slave.com
PING slave.com 56(84) bytes of data.
.^C
--- slave.com ping statistics ---
71492 packets transmitted, 71491 received, 0.00139876% packet loss, time 25747ms
rtt min/avg/max/mdev = 0.239/0.339/1.941/0.066 ms, ipg/ewma 0.360/0.335 ms

我在互联网上搜索过这个问题,但在试图理解问题的根本原因或如何调试问题方面,我没有找到任何令人满意的答案。

现在,它正在按预期工作,但早些时候,当我们遇到这个问题时,我在主人身上看到了这一点:

代码语言:javascript
复制
$ SELECT * FROM information_schema.processlist WHERE User="replication"\G;
           USER: replication
           HOST: slave.com:47200
             DB: NULL
        COMMAND: Binlog Dump
           TIME: 2041
          STATE: Master has sent all binlog to slave; waiting for binlog to be updated

我不知道该怎么做。为什么“它把所有的二进制日志都发送给奴隶了”?为什么这个时候的滞后还在增加呢?有什么坚持吗?为什么没有将主SQL查询发送给从服务器?

截至目前,世界协调时间2022-11-01时10:48,滞后正在缓慢增加:

代码语言:javascript
复制
MariaDB [(none)]> SHOW SLAVE STATUS\G;
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: master.com
                   Master_User: replication
                   Master_Port: 3306
                 Connect_Retry: 5
               Master_Log_File: mariadb-bin.002061
           Read_Master_Log_Pos: 395606907
                Relay_Log_File: mysqld-relay-bin.000253
                 Relay_Log_Pos: 340046809
         Relay_Master_Log_File: mariadb-bin.002061
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB: 
           Replicate_Ignore_DB: 
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 0
                    Last_Error: 
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 340046508
               Relay_Log_Space: 395607567
               Until_Condition: None
                Until_Log_File: 
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File: 
            Master_SSL_CA_Path: 
               Master_SSL_Cert: 
             Master_SSL_Cipher: 
                Master_SSL_Key: 
         Seconds_Behind_Master: 389
 Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error: 
                Last_SQL_Errno: 0
                Last_SQL_Error: 
   Replicate_Ignore_Server_Ids: 
              Master_Server_Id: 105
                Master_SSL_Crl: 
            Master_SSL_Crlpath: 
                    Using_Gtid: Slave_Pos
                   Gtid_IO_Pos: 0-105-10062038390
       Replicate_Do_Domain_Ids: 
   Replicate_Ignore_Domain_Ids: 
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Closing tables
              Slave_DDL_Groups: 8
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 81798798
1 row in set (0.000 sec)

ERROR: No query specified

MariaDB [(none)]> SHOW FULL PROCESSLIST\G;
*************************** 1. row ***************************
      Id: 1
    User: system user
    Host: 
      db: NULL
 Command: Daemon
    Time: NULL
   State: InnoDB purge coordinator
    Info: NULL
Progress: 0.000
*************************** 2. row ***************************
      Id: 2
    User: system user
    Host: 
      db: NULL
 Command: Daemon
    Time: NULL
   State: InnoDB purge worker
    Info: NULL
Progress: 0.000
*************************** 3. row ***************************
      Id: 3
    User: system user
    Host: 
      db: NULL
 Command: Daemon
    Time: NULL
   State: InnoDB purge worker
    Info: NULL
Progress: 0.000
*************************** 4. row ***************************
      Id: 4
    User: system user
    Host: 
      db: NULL
 Command: Daemon
    Time: NULL
   State: InnoDB purge worker
    Info: NULL
Progress: 0.000
*************************** 5. row ***************************
      Id: 5
    User: system user
    Host: 
      db: NULL
 Command: Daemon
    Time: NULL
   State: InnoDB shutdown handler
    Info: NULL
Progress: 0.000
*************************** 6. row ***************************
      Id: 11
    User: system user
    Host: 
      db: NULL
 Command: Slave_IO
    Time: 945669
   State: Waiting for master to send event
    Info: NULL
Progress: 0.000
*************************** 7. row ***************************
      Id: 12
    User: system user
    Host: 
      db: NULL
 Command: Slave_SQL
    Time: 0
   State: Closing tables
    Info: NULL
Progress: 0.000
*************************** 8. row ***************************
      Id: 50
    User: replication
    Host: XX.XX.XX.XXX:52970
      db: NULL
 Command: Binlog Dump
    Time: 945656
   State: Master has sent all binlog to slave; waiting for binlog to be updated
    Info: NULL
Progress: 0.000
*************************** 9. row ***************************
      Id: 1244933
    User: replication
    Host: XX.XX.XX.XX:54514
      db: NULL
 Command: Binlog Dump
    Time: 83210
   State: Master has sent all binlog to slave; waiting for binlog to be updated
    Info: NULL
Progress: 0.000
*************************** 10. row ***************************
      Id: 1372686
    User: root
    Host: localhost
      db: NULL
 Command: Query
    Time: 0
   State: Init
    Info: SHOW FULL PROCESSLIST
Progress: 0.000
10 rows in set (0.000 sec)

ERROR: No query specified

MariaDB [(none)]>


MySQL on localhost (10.4.26)                                                                        load 1.08 1.31 1.29 1/388 427200 up 10+22:42:50 [10:47:22]
 Queries: 34.0     qps:    0 Slow:     0.0         Se/In/Up/De(%):    00/00/00/00 
 Sorts:      0 qps now:    1 Slow qps: 0.0  Threads:    3 (   1/   1) 00/00/00/00 
 Key Efficiency: 100.0%  Bps in/out:   0.0/  0.2   Now in/out:  21.2/ 3.8k
 Replication IO:Yes SQL:Yes Delay: 464 sec.

       Id      User         Host/IP         DB       Time    Cmd    State Query                                                                               
       --      ----         -------         --       ----    ---    ----- ----------                                                                          
       11 system us                                945768 Slave_  Waiting                                                                                     
       50 replicati          slave2                945756 Binlog Master h                                                                                     
  1244933 replicati          slave3                 83310 Binlog Master h                                                                                     
        1 system us                                     0 Daemon InnoDB p                                                                                     
        2 system us                                     0 Daemon InnoDB p                                                                                     
        3 system us                                     0 Daemon InnoDB p                                                                                     
        4 system us                                     0 Daemon InnoDB p                                                                                     
        5 system us                                     0 Daemon InnoDB s                                                                                     
       12 system us                                     0 Slave_  Closing                                                                                     
  1372858      root       localhost                     0  Query     Init show full processlist   

请给我建议。

EN

回答 2

Database Administration用户

回答已采纳

发布于 2023-01-05 18:55:39

原来这是MariaDB的一个已知的bug/特性。

https://mariadb.com/resources/blog/goodbye-replication-lag/

要解决这个问题,在所有充当从服务器的DB实例上,必须以这种方式将slave_parallel_threads变量从默认值1增加到更大的值:

代码语言:javascript
复制
STOP SLAVE SQL_THREAD;
SET GLOBAL slave_parallel_threads = 4;
START SLAVE SQL_THREAD;
SELECT @@slave_parallel_threads;

不幸的是,此变量仅为运行时,需要在每次重新启动/重新启动MariaDB服务器后重置。

票数 0
EN

Database Administration用户

发布于 2023-05-10 06:22:31

将其放入配置中以使其永久化:

代码语言:javascript
复制
slave_parallel_threads = 4
票数 0
EN
页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://dba.stackexchange.com/questions/319051

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档