首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >主网卡故障时未能启动备份网卡

主网卡故障时未能启动备份网卡
EN

Server Fault用户
提问于 2017-03-03 10:42:57
回答 1查看 925关注 0票数 1

我们的生产服务器有4张网卡,链接2乘2到2键。

外部网络: bond0 : eth0启动和运行,eth1活动备份内部网络: bond1 : eth2启动和运行,eth3活动备份

同时,我们得到了eth0和eth2的故障:

代码语言:javascript
复制
Mar  3 10:38:16 localhost kernel: [93739227.917537] tg3 0000:02:00.0 eth0: 0x000068b0: 0xe0011514, 0x00000000, 0x00000000, 0x00000000
Mar  3 10:38:16 localhost kernel: [93739227.930035] tg3 0000:02:00.0 eth0: 0x000068e0: 0x00000000, 0x00000000, 0x00000000, 0x0001c2cc
Mar  3 10:38:16 localhost kernel: [93739227.942529] tg3 0000:02:00.0 eth0: 0x000068f0: 0x00ff000e, 0x00ff0000, 0x00000000, 0x04444444
...
Mar  3 10:38:17 localhost kernel: [93739228.141585] tg3 0000:02:00.0 eth0: 4: NAPI info [0000000a:0000000a:(0000:0000:01ff):04dc:(04dc:04dc:0000:0000)]
Mar  3 10:38:17 localhost kernel: [93739228.201559] bonding: bond0: link status definitely down for interface eth0, disabling it
Mar  3 10:38:17 localhost kernel: [93739228.216343] tg3 0000:02:00.0 eth0: Link is down
Mar  3 10:38:18 localhost kernel: [93739229.253266] bonding: bond0: now running without any active interface !


Mar  3 10:38:18 localhost kernel: [93739229.253331] tg3 0000:08:00.0 eth2: transmit timed out, resetting
Mar  3 10:38:19 localhost kernel: [93739230.509553] tg3 0000:08:00.0 eth2: 0x00000000: 0x165f14e4, 0x00100406, 0x02000000, 0x00800010
Mar  3 10:38:19 localhost kernel: [93739230.521603] tg3 0000:08:00.0 eth2: 0x00000010: 0xd90a000c, 0x00000000, 0xd90b000c, 0x00000000
Mar  3 10:38:19 localhost kernel: [93739230.533658] tg3 0000:08:00.0 eth2: 0x00000020: 0xd90c000c, 0x00000000, 0x00000000, 0x200314e4
Mar  3 10:38:19 localhost kernel: [93739230.545704] tg3 0000:08:00.0 eth2: 0x00000030: 0xdd000000, 0x00000048, 0x00000000, 0x0000010f
Mar  3 10:38:19 localhost kernel: [93739230.557755] tg3 0000:08:00.0 eth2: 0x00000040: 0x00000000, 0xa5000000, 0xc8035001, 0x64002008
Mar  3 10:38:19 localhost kernel: [93739230.569808] tg3 0000:08:00.0 eth2: 0x00000050: 0x818c5803, 0x78000000, 0x0086a005, 0x00000000
...
Mar  3 10:38:23 localhost kernel: [93739234.611688] tg3 0000:08:00.0 eth2: 4: Host status block [00000001:000000df:(0000:0000:0a0f):(0000:0000)]
Mar  3 10:38:23 localhost kernel: [93739234.624030] tg3 0000:08:00.0 eth2: 4: NAPI info [000000c4:000000c4:(0000:0000:01ff):09d4:(01d4:01d4:0000:0000)]
Mar  3 10:38:23 localhost kernel: [93739234.699205] bonding: bond1: link status definitely down for interface eth2, disabling it
Mar  3 10:38:23 localhost kernel: [93739234.738410] tg3 0000:08:00.0: tg3_stop_block timed out, ofs=1400 enable_bit=2
Mar  3 10:38:23 localhost kernel: [93739234.850735] tg3 0000:08:00.0: tg3_stop_block timed out, ofs=c00 enable_bit=2
Mar  3 10:38:23 localhost kernel: [93739234.977285] tg3 0000:08:00.0 eth2: Link is down
Mar  3 10:38:25 localhost kernel: [93739236.081087] bonding: bond1: now running without any active interface !

( 1)由于两个不同的网络同时发生,我们怀疑有一个硬件问题(电源的主板或微切,即电源故障),请告诉我是否同意我的诊断;

( 2)配置为主动备份的键是在发生故障时保持热备份网卡。正如您在这里看到的,它似乎没有运行备份,甚至没有考虑任何有关。事件发生时,我检查了ifconfig,eth1和eth3 (备份)被正确地附加到它们各自的债券上。

如果债券没有切换到热备份卡,那会有什么问题呢?

编辑:完全网络配置:

代码语言:javascript
复制
bond0     Link encap:Ethernet  HWaddr 90:b1:1c:xxxxx  
          inet addr:195.178.186.222  Bcast:195.178.xxxxxxx    Mask:255.255.255.224
          inet6 addr: fe80::92xxxxa:4b1e/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:11806289 errors:0 dropped:563346 overruns:0 frame:0
          TX packets:15209428 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:2314496738 (2.3 GB)  TX bytes:17247449206 (17.2 GB)

bond1     Link encap:Ethernet  HWaddr 00:10:1xxxx:ce  
          inet addr:192.168.0.1  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::210:18ff:fed3:b1ce/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:161091053340 errors:0 dropped:1071 overruns:0 frame:13821
          TX packets:112926434041 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:99357307904176 (99.3 TB)  TX bytes:45744253012472 (45.7 TB)


eth0      Link encap:Ethernet  HWaddr 90:b1:xxxxxx4b:1e  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:11806289 errors:0 dropped:563346 overruns:0 frame:0
          TX packets:15209428 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2314496738 (2.3 GB)  TX bytes:17247449206 (17.2 GB)
          Interrupt:16 

eth1      Link encap:Ethernet  HWaddr 90:b1:1xxxxxx:1e  
          UP BROADCAST SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:17 

eth2      Link encap:Ethernet  HWaddr 00:10:xxxxx1:ce  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:161091053340 errors:0 dropped:1070 overruns:0 frame:13821
          TX packets:112926434041 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:99357307904176 (99.3 TB)  TX bytes:45744253012472 (45.7 TB)
          Interrupt:48 

eth3      Link encap:Ethernet  HWaddr 00:10xxxb1:ce  
          UP BROADCAST SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:52 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:6935638599 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6935638599 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:18028725295176 (18.0 TB)  TX bytes:18028725295176 (18.0 TB)

下面是/proc/net/粘接/债券0 (bond1类似)

代码语言:javascript
复制
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: eth0 (primary_reselect always)
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 90:b1:1c:4a:4b:1e
Slave queue ID: 0

Slave Interface: eth1
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: 90:b1:1c:4a:4b:1f
Slave queue ID: 0
EN

回答 1

Server Fault用户

发布于 2017-08-17 13:15:59

最后,找出另一个问题出现时的问题所在。当这种情况发生时,我们打电话给数据中心,让他们派一名技术人员检查端口和电缆。我们得到的答案是一切都很好,港口也在闪烁。

当我去数据中心看另一个问题时,我看了看机器后面的电缆.eth0和eth2都有电缆连接,但是eth1和eth3连插头都没插上!他们怎么会错过这个呢!

这个故事的士气在于,如果热备份已经启动,但是无法处理故障转移,日志中没有任何内容,那么这就是电缆或端口问题。也要检查自己的事情,不要相信别人会帮你做,他们不会在意的。

票数 0
EN
页面原文内容由Server Fault提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://serverfault.com/questions/836068

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档