我有一个Proxmox集群,包含两个物理服务器,pve和pve2。它们是相同的戴尔R710s 96 10的内存和1TB (RAID-10)。由于某些原因,我还没有确定,pve2将动力循环。我已经通过iDRAC检查了HW日志,没有警报或错误。
我对Proxmox相当陌生,所以我不知道在像syslog和dmesg这样的Linux站点之外在哪里查找错误日志。
下面是我的syslog在重新启动时的一个片段:(@ Dec 30 16:54:01)
Dec 30 16:50:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:50:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:50:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:51:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:51:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:51:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:52:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:52:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:52:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:53:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:53:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:53:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:54:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Dec 30 16:54:01 pve2 systemd[1]: pvesr.service: Succeeded.
Dec 30 16:54:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Dec 30 16:57:42 pve2 dmeventd[492]: dmeventd ready for processing.
Dec 30 16:57:42 pve2 kernel: [ 0.000000] Linux version 5.4.73-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.73-1 (Mon, 16 Nov 2020 10:52:16 +0100) ()
Dec 30 16:57:42 pve2 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.73-1-pve root=/dev/mapper/pve-root ro quiet
Dec 30 16:57:42 pve2 kernel: [ 0.000000] KERNEL supported cpus:
Dec 30 16:57:42 pve2 systemd-modules-load[483]: Inserted module 'iscsi_tcp'
Dec 30 16:57:42 pve2 kernel: [ 0.000000] Intel GenuineIntel
Dec 30 16:57:42 pve2 kernel: [ 0.000000] AMD AuthenticAMD
Dec 30 16:57:42 pve2 kernel: [ 0.000000] Hygon HygonGenuine
Dec 30 16:57:42 pve2 kernel: [ 0.000000] Centaur CentaurHauls
Dec 30 16:57:42 pve2 kernel: [ 0.000000] zhaoxin Shanghai
Dec 30 16:57:42 pve2 systemd[1]: Starting Flush Journal to Persistent Storage...
Dec 30 16:57:42 pve2 kernel: [ 0.000000] x86/fpu: x87 FPU will use FXSAVE
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-provided physical RAM map:
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bf378fff] usable
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf379000-0x00000000bf38efff] reserved
Dec 30 16:57:42 pve2 systemd[1]: Started udev Coldplug all Devices.
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf38f000-0x00000000bf3cdfff] ACPI data
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf3ce000-0x00000000bfffffff] reserved
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
Dec 30 16:57:42 pve2 systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000fe000000-0x00000000ffffffff] reserved
Dec 30 16:57:42 pve2 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000183fffffff] usable
Dec 30 16:57:42 pve2 kernel: [ 0.000000] NX (Execute Disable) protection: active
Dec 30 16:57:42 pve2 kernel: [ 0.000000] SMBIOS 2.6 present.
Dec 30 16:57:42 pve2 kernel: [ 0.000000] DMI: Dell Inc. PowerEdge R710/0Y7JM4, BIOS 6.3.0 07/24/2012关于它可能是什么或者我可以在哪里检查,有什么建议吗?
发布于 2021-02-12 11:32:48
可能是硬件故障,最常见的是RAM故障。
在服务器上运行memtest以找出答案。
https://serverfault.com/questions/1048116
复制相似问题