我们在AWS上运行的MongoDB副本集遇到了时钟漂移问题。这似乎只是刚刚开始发生后,我们添加了额外的数据集,在那之前,我们没有真正注意到这个问题,除非系统是在沉重的负荷。以下错误是偶尔记录在mongod.log文件中的,并且系统没有加载。
为了测试这一点,我们隔离了一组具有相同数据集的机器,虽然错误仍在发生,但我们的web应用程序并没有使用它;
2014年-12T13:33:51.333+0000 rsBackgroundSync更改同步目标,因为当前同步目标的最新OpTime是12月12 :32:42:C,这比蒙古成员1:27017多30秒,最近的OpTime是1418391230。
从上面的时间戳可以看出,mongodb副本集成员之一落后一分钟以上。我们所见过的最糟糕的情况是12分钟不同步。
此错误反过来会导致复制滞后,我们从Mongo监视服务收到有关这方面的通知,尽管它确实更正了自己。
设置是3 x r3.xlarge AWS实例,在EU-West-1A区域的每个可用性区域中都有一个实例。这些机器是通过Raid数组和Mongo提供的cloud formation脚本使用Mongo推荐的设置来设置的。数据大小约为4GB。
我们认为这个问题与NTP同步有关,默认情况下,在服务被配置到托管在www.pool.ntp.org上的AWS服务器池中。
为了排除这种情况,我们在AWS上设置了自己的NTP服务器,MongoDB服务器可以同步到该服务器。问题仍在发生,因此我们更改了mongo机器上ntpd服务的maxpoll和min轮询时间,以同步NTP服务器上的every 16 seconds时间,但错误仍在发生。
我们还增加了MongoDB OpLog的大小,看看这是否会产生任何不同,但它没有。
还有其他人遇到过这种问题吗?我们遗漏了什么吗?
干杯,
科林。
ps,-ef,grep,ntp;
mongodb1
ntp 5163 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 15865 15839 0 09:31 pts/2 00:00:00 grep ntp
mongodb2
ntp 4834 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 19056 19029 0 09:31 pts/0 00:00:00 grep ntp
mongodb3
ntp 5795 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 26199 26173 0 09:31 pts/0 00:00:00 grep ntpcat /etc/ntp.conf;
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.amazon.pool.ntp.org iburst dynamic
#server 1.amazon.pool.ntp.org iburst dynamic
#server 2.amazon.pool.ntp.org iburst dynamic
#server 3.amazon.pool.ntp.org iburst dynamic
server time-server.domain.com iburst
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography.
#crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys
# Specify the key identifiers which are trusted.
#trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility.
#requestkey 8
# Specify the key identifier to use with the ntpq utility.
#controlkey 8
# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats
# Enable additional logging.
logconfig =clockall =peerall =sysall =syncall
# Listen only on the primary network interface.
interface listen eth0
interface ignore ipv6ntpq -npcrv;
remote refid st t when poll reach delay offset jitter
==============================================================================
*172.31.14.137 91.*.*.* 3 u 557 1024 377 1.121 -0.264 0.161
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5@1.2349-o Sat Mar 23 00:37:31 UTC 2013 (1)",
processor="x86_64", system="Linux/3.14.23-22.44.amzn1.x86_64", leap=00,
stratum=4, precision=-23, rootdelay=23.597, rootdisp=109.962,
refid=172.31.14.137,
reftime=d83a757a.175b5fa1 Tue, Dec 16 2014 9:10:18.091,
clock=d83a77a7.82431efa Tue, Dec 16 2014 9:19:35.508, peer=27361,
tc=10, mintc=3, offset=-0.264, frequency=-13.994, sys_jitter=0.000,
clk_jitter=0.358, clk_wander=0.053发布于 2015-05-06 08:23:04
在使用MongoDB存储引擎升级到WiredTiger 3之后,我们不再看到这个问题了。
https://stackoverflow.com/questions/27447810
复制相似问题