我使用安装了amazon的EC2实例(带有来自DHCP的amazon服务器设置)以及一个RDS数据库。EC2实例位于ELB之后,具有较高的通信量。我使用的应用程序是用PHP编写的。
问题是当PHP试图连接到RDS数据库时,有时它会返回以下错误:
PHP Warning: mysqli_connect(): (HY000/2005): Unknown MySQL server host ...这种情况不会经常发生,但有时会变得更糟;我收到了成千上万的错误事件。
有什么诊断问题的建议吗?我正在考虑将所有DNS流量转储到一个文件中并检查它,但是服务器的流量非常高,因此很难从该文件中跟踪。
Ip:
197171459 total packets received
1 with invalid addresses
0 forwarded
0 incoming packets discarded
197171458 incoming packets delivered
175015443 requests sent out
Icmp:
12528 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
destination unreachable: 188
echo requests: 12340
12559 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 219
echo replies: 12340
IcmpMsg:
InType3: 188
InType8: 12340
OutType0: 12340
OutType3: 219
Tcp:
5231380 active connections openings
3978862 passive connection openings
881 failed connection attempts
6420 connection resets received
17 connections established
191630575 segments received
200105352 segments send out
2797151 segments retransmited
0 bad segments received.
6910 resets sent
Udp:
5577451 packets received
219 packets to unknown port received.
0 packet receive errors
5577700 packets sent
UdpLite:
TcpExt:
172 invalid SYN cookies received
808 resets received for embryonic SYN_RECV sockets
7176788 TCP sockets finished time wait in fast timer
507 packets rejects in established connections because of timestamp
448055 delayed acks sent
2927 delayed acks further delayed because of locked socket
Quick ack mode was activated 2433 times
94865861 packets directly queued to recvmsg prequeue.
16611185 packets directly received from backlog
54150864749 packets directly received from prequeue
2158966 packets header predicted
79141174 packets header predicted and directly queued to user
40780030 acknowledgments not containing data received
56946553 predicted acknowledgments
84 times recovered from packet loss due to SACK data
Detected reordering 4 times using FACK
Detected reordering 11 times using SACK
Detected reordering 69 times using time stamp
70 congestion windows fully recovered
1241 congestion windows partially recovered using Hoe heuristic
TCPDSACKUndo: 13
2491 congestion windows recovered after partial ack
0 TCP data loss events
220 timeouts after SACK recovery
104 fast retransmits
99 forward retransmits
7 retransmits in slow start
2792531 other TCP timeouts
22 times receiver scheduled too late for direct processing
2423 DSACKs sent for old packets
2785871 DSACKs received
5162 connections reset due to unexpected data
921 connections reset due to early user close
135 connections aborted due to timeout
TCPDSACKIgnoredOld: 533
TCPDSACKIgnoredNoUndo: 393
TCPSackShifted: 477
TCPSackMerged: 536
TCPSackShiftFallback: 2709
TCPBacklogDrop: 46
TCPDeferAcceptDrop: 3906058
IpExt:
InOctets: 69400712361
OutOctets: 94841399143发布于 2012-04-25 15:45:34
有一个已知的AWS错误会导致DNS解析偶尔失败:
https://forums.aws.amazon.com/thread.jspa?messageID=330465#330465
您可能希望使用持久连接进行测试,因为这将减少执行DNS解析的频率。
本地DNS缓存(例如pdns-递归或强的松痛)将减少频率,但RDS主机名记录的TTL非常短(60秒),因此这意味着问题发生的频率要低得多,但仍然每天发生几次。
发布于 2011-08-21 06:39:59
你提到交通拥挤。我想知道你是否遇到了网络问题。您是否已经监视了服务器上的SNMP状态?您应该考虑使用IF-MIB中的一些值:
IF-MIB::ifInOctets.1 = Counter32: 117194642
IF-MIB::ifInOctets.2 = Counter32: 3406296104
IF-MIB::ifInOctets.3 = Counter32: 754235769
IF-MIB::ifInOctets.4 = Counter32: 0
IF-MIB::ifInUcastPkts.1 = Counter32: 112415844
IF-MIB::ifInUcastPkts.2 = Counter32: 352495427
IF-MIB::ifInUcastPkts.3 = Counter32: 588414566
IF-MIB::ifInUcastPkts.4 = Counter32: 0
IF-MIB::ifInNUcastPkts.1 = Counter32: 0
IF-MIB::ifInNUcastPkts.2 = Counter32: 5038722
IF-MIB::ifInNUcastPkts.3 = Counter32: 4835908
IF-MIB::ifInNUcastPkts.4 = Counter32: 0
IF-MIB::ifInDiscards.1 = Counter32: 0
IF-MIB::ifInDiscards.2 = Counter32: 0
IF-MIB::ifInDiscards.3 = Counter32: 0
IF-MIB::ifInDiscards.4 = Counter32: 0
IF-MIB::ifInErrors.1 = Counter32: 0
IF-MIB::ifInErrors.2 = Counter32: 0
IF-MIB::ifInErrors.3 = Counter32: 0
IF-MIB::ifInErrors.4 = Counter32: 0有关这方面的更多信息:
http://www.oidview.com/mibs/0/IF-MIB.html
您还可以使用以下方法检查一些网络统计数据:
# netstat -s不过,我通常认为,在生产中引用其他服务器时,在配置文件中使用in是一个更好的选择。
https://serverfault.com/questions/302915
复制相似问题