文章/答案/技术大牛

发布

社区首页 >问答首页 >没有明显的原因，MariaDB复制每天延迟十几次。

问没有明显的原因，MariaDB复制每天延迟十几次。
EN

Database Administration用户

提问于 2022-11-01 10:54:43

回答 2查看 354关注 0票数 1

我们首先使用功能较弱的服务器来建立MariaDB复制，但是考虑到我们增加的数据库负载，我们不得不租用更强大的服务器，这时问题就开始了。

现在，主从之间的复制有时会延迟在0到1000秒之间(它正在缓慢上升到1000秒，大约是两秒钟)，原因我们还无法确定。这种情况平均每天发生十几次。考虑到我公司业务的性质，我们根本承受不起任何滞后。

主

主处理器有6核Intel CPU，128 on内存，1TB SSD在raid 1配置中，平均负载略高于2，在Ubuntu20.04LTS上运行MariaDB 10.4.2x。

在过去的大约18个小时里，日志中只有两个与复制相关的错误/警告：

Oct 21 12:00:35 master.com mysqld[808]: 2022-10-21 12:00:35 5677386 [Warning] Aborted connection 5677386 to db: 'unconnected' user: 'replication' host: '1.2.3.4' (Got an error writing communication packets)
Oct 21 12:03:46 master.com mysqld[808]: 2022-10-21 12:03:46 5780035 [Warning] Aborted connection 5780035 to db: 'unconnected' user: 'replication' host: '1.2.3.4' (Got an error writing communication packets)

从

从站具有完全相同的HW配置，负载平均值接近1。

MariaDB [(none)]> show slave status\G 
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: master.com
                   Master_User: replication
                   Master_Port: 3306
                 Connect_Retry: 5
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB: 
           Replicate_Ignore_DB: 
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 0
                    Last_Error: 
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 133718212
               Relay_Log_Space: 133718872
               Until_Condition: None
                Until_Log_File: 
                 Until_Log_Pos: 0
         Seconds_Behind_Master: 0
                 Last_IO_Errno: 0
                 Last_IO_Error: 
                Last_SQL_Errno: 0
                Last_SQL_Error: 
   Replicate_Ignore_Server_Ids: 
              Master_Server_Id: 105
                    Using_Gtid: Slave_Pos
                   Gtid_IO_Pos: 0-105-9982390796
       Replicate_Do_Domain_Ids: 
   Replicate_Ignore_Domain_Ids: 
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 2226622

服务器之间有1Gbit的连接，几乎是完美的。ping -f报告的丢失数据包不到0.002%：

root@master # ping -f slave.com
PING slave.com 56(84) bytes of data.
.^C
--- slave.com ping statistics ---
71492 packets transmitted, 71491 received, 0.00139876% packet loss, time 25747ms
rtt min/avg/max/mdev = 0.239/0.339/1.941/0.066 ms, ipg/ewma 0.360/0.335 ms

我在互联网上搜索过这个问题，但在试图理解问题的根本原因或如何调试问题方面，我没有找到任何令人满意的答案。

现在，它正在按预期工作，但早些时候，当我们遇到这个问题时，我在主人身上看到了这一点：

$ SELECT * FROM information_schema.processlist WHERE User="replication"\G;
           USER: replication
           HOST: slave.com:47200
             DB: NULL
        COMMAND: Binlog Dump
           TIME: 2041
          STATE: Master has sent all binlog to slave; waiting for binlog to be updated

我不知道该怎么做。为什么“它把所有的二进制日志都发送给奴隶了”？为什么这个时候的滞后还在增加呢？有什么坚持吗？为什么没有将主SQL查询发送给从服务器？

截至目前，世界协调时间2022-11-01时10:48，滞后正在缓慢增加：

MariaDB [(none)]> SHOW SLAVE STATUS\G;
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: master.com
                   Master_User: replication
                   Master_Port: 3306
                 Connect_Retry: 5
               Master_Log_File: mariadb-bin.002061
           Read_Master_Log_Pos: 395606907
                Relay_Log_File: mysqld-relay-bin.000253
                 Relay_Log_Pos: 340046809
         Relay_Master_Log_File: mariadb-bin.002061
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB: 
           Replicate_Ignore_DB: 
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 0
                    Last_Error: 
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 340046508
               Relay_Log_Space: 395607567
               Until_Condition: None
                Until_Log_File: 
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File: 
            Master_SSL_CA_Path: 
               Master_SSL_Cert: 
             Master_SSL_Cipher: 
                Master_SSL_Key: 
         Seconds_Behind_Master: 389
 Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error: 
                Last_SQL_Errno: 0
                Last_SQL_Error: 
   Replicate_Ignore_Server_Ids: 
              Master_Server_Id: 105
                Master_SSL_Crl: 
            Master_SSL_Crlpath: 
                    Using_Gtid: Slave_Pos
                   Gtid_IO_Pos: 0-105-10062038390
       Replicate_Do_Domain_Ids: 
   Replicate_Ignore_Domain_Ids: 
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Closing tables
              Slave_DDL_Groups: 8
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 81798798
1 row in set (0.000 sec)

ERROR: No query specified

MariaDB [(none)]> SHOW FULL PROCESSLIST\G;
*************************** 1. row ***************************
      Id: 1
    User: system user
    Host: 
      db: NULL
 Command: Daemon
    Time: NULL
   State: InnoDB purge coordinator
    Info: NULL
Progress: 0.000
*************************** 2. row ***************************
      Id: 2
    User: system user
    Host: 
      db: NULL
 Command: Daemon
    Time: NULL
   State: InnoDB purge worker
    Info: NULL
Progress: 0.000
*************************** 3. row ***************************
      Id: 3
    User: system user
    Host: 
      db: NULL
 Command: Daemon
    Time: NULL
   State: InnoDB purge worker
    Info: NULL
Progress: 0.000
*************************** 4. row ***************************
      Id: 4
    User: system user
    Host: 
      db: NULL
 Command: Daemon
    Time: NULL
   State: InnoDB purge worker
    Info: NULL
Progress: 0.000
*************************** 5. row ***************************
      Id: 5
    User: system user
    Host: 
      db: NULL
 Command: Daemon
    Time: NULL
   State: InnoDB shutdown handler
    Info: NULL
Progress: 0.000
*************************** 6. row ***************************
      Id: 11
    User: system user
    Host: 
      db: NULL
 Command: Slave_IO
    Time: 945669
   State: Waiting for master to send event
    Info: NULL
Progress: 0.000
*************************** 7. row ***************************
      Id: 12
    User: system user
    Host: 
      db: NULL
 Command: Slave_SQL
    Time: 0
   State: Closing tables
    Info: NULL
Progress: 0.000
*************************** 8. row ***************************
      Id: 50
    User: replication
    Host: XX.XX.XX.XXX:52970
      db: NULL
 Command: Binlog Dump
    Time: 945656
   State: Master has sent all binlog to slave; waiting for binlog to be updated
    Info: NULL
Progress: 0.000
*************************** 9. row ***************************
      Id: 1244933
    User: replication
    Host: XX.XX.XX.XX:54514
      db: NULL
 Command: Binlog Dump
    Time: 83210
   State: Master has sent all binlog to slave; waiting for binlog to be updated
    Info: NULL
Progress: 0.000
*************************** 10. row ***************************
      Id: 1372686
    User: root
    Host: localhost
      db: NULL
 Command: Query
    Time: 0
   State: Init
    Info: SHOW FULL PROCESSLIST
Progress: 0.000
10 rows in set (0.000 sec)

ERROR: No query specified

MariaDB [(none)]>


MySQL on localhost (10.4.26)                                                                        load 1.08 1.31 1.29 1/388 427200 up 10+22:42:50 [10:47:22]
 Queries: 34.0     qps:    0 Slow:     0.0         Se/In/Up/De(%):    00/00/00/00 
 Sorts:      0 qps now:    1 Slow qps: 0.0  Threads:    3 (   1/   1) 00/00/00/00 
 Key Efficiency: 100.0%  Bps in/out:   0.0/  0.2   Now in/out:  21.2/ 3.8k
 Replication IO:Yes SQL:Yes Delay: 464 sec.

       Id      User         Host/IP         DB       Time    Cmd    State Query                                                                               
       --      ----         -------         --       ----    ---    ----- ----------                                                                          
       11 system us                                945768 Slave_  Waiting                                                                                     
       50 replicati          slave2                945756 Binlog Master h                                                                                     
  1244933 replicati          slave3                 83310 Binlog Master h                                                                                     
        1 system us                                     0 Daemon InnoDB p                                                                                     
        2 system us                                     0 Daemon InnoDB p                                                                                     
        3 system us                                     0 Daemon InnoDB p                                                                                     
        4 system us                                     0 Daemon InnoDB p                                                                                     
        5 system us                                     0 Daemon InnoDB s                                                                                     
       12 system us                                     0 Slave_  Closing                                                                                     
  1372858      root       localhost                     0  Query     Init show full processlist

请给我建议。

replication

mariadb-10.4

回答 2

Database Administration用户

回答已采纳

发布于 2023-01-05 18:55:39

原来这是MariaDB的一个已知的bug/特性。

https://mariadb.com/resources/blog/goodbye-replication-lag/

要解决这个问题，在所有充当从服务器的DB实例上，必须以这种方式将slave_parallel_threads变量从默认值1增加到更大的值：

STOP SLAVE SQL_THREAD;
SET GLOBAL slave_parallel_threads = 4;
START SLAVE SQL_THREAD;
SELECT @@slave_parallel_threads;

不幸的是，此变量仅为运行时，需要在每次重新启动/重新启动MariaDB服务器后重置。

票数 0

Database Administration用户

发布于 2023-05-10 06:22:31

将其放入配置中以使其永久化：

slave_parallel_threads = 4

票数 0

页面原文内容由Database Administration提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://dba.stackexchange.com/questions/319051

复制

相似问题

问没有明显的原因，MariaDB复制每天延迟十几次。
EN

主

从

回答 2

Database Administration用户

Database Administration用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问没有明显的原因，MariaDB复制每天延迟十几次。EN

主

从

回答 2

Database Administration用户

Database Administration用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问没有明显的原因，MariaDB复制每天延迟十几次。
EN