(问题被重新表述了,我认为需要更有条理)
我们有一个Proxmox戴尔PowerEdge R610 gen 8系统。该平台虽然老了,但我们把它应用于S/W上,它没有现代CPU核的优点,但性能与CPU时钟频率成线性关系,3.3GHz很好地实现了这一目标。性能分析表明,磁盘I/O是严重的瓶颈,而其他瓶颈则不然。
HW配置是:
我们使用的MegaRAID不是内置的PERC。内置只能做1.5Gbit/S SATA,这是太慢,也是JBOD或HBA模式是禁用的。与此不同的是,一个新增的9240-4i运行SSD的最大接口速度为6 Gbit/s,并允许JBOD模式。
该卡没有电池,也没有缓存,因此很明显,当RAID与其一起构建时,它的性能太低,因此这两个磁盘都配置为JBOD,并与软件RAID一起使用。6Gbit/S接口的理论最大值为600 MB/s (考虑8到10位线编码),这是对单驱动器顺序测试的期望。
我们在Linux和Windows下都进行了广泛的i/o测试,都使用了配置相同的fio。配置中唯一的不同是aio库( Windows中的windowsaio,Linux中的libaio )和测试设备规范。fio是从以下文章中改编的:https://forum.proxmox.com/threads/pve-6-0-slow-ssd-raid1-performance-in-windows-vm.58559/#post-270657。我无法显示完整的fio输出,因为这将达到30k字符的ServerFault限制。如果有人想看的话我可以在别的地方分享。在这里,我将只显示摘要行。Linux (Proxmox )配置了MD RAID1和“厚厚的”LVM。
启用了SSD中的缓存:
# hdparm -W /dev/sd[ab]
/dev/sda:
write-caching = 1 (on)
/dev/sdb:
write-caching = 1 (on)设备以完全6GB/S接口速度运行:
# smartctl -i /dev/sda
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.3.10-1-pve] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: Samsung SSD 860 EVO 1TB
Serial Number: S4FMNE0MBxxxxxx
LU WWN Device Id: x xxxxxx xxxxxxxxx
Firmware Version: RVT03B6Q
User Capacity: 1 000 204 886 016 bytes [1,00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Feb 7 15:25:45 2020 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
# smartctl -i /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.3.10-1-pve] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: Samsung SSD 860 EVO 1TB
Serial Number: S4FMNE0MBxxxxxx
LU WWN Device Id: x xxxxxx xxxxxxxxx
Firmware Version: RVT03B6Q
User Capacity: 1 000 204 886 016 bytes [1,00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Feb 7 15:25:47 2020 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled分区小心地对齐到1 MiB,“主”大分区(即LVM )和所有测试的执行位置完全从512 MiB开始:
# fdisk -l /dev/sd[ab]
Disk /dev/sda: 931,5 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 1DDCF7A0-D894-8C43-8975-C609D4C3C742
Device Start End Sectors Size Type
/dev/sda1 2048 524287 522240 255M EFI System
/dev/sda2 524288 526335 2048 1M BIOS boot
/dev/sda3 526336 1048575 522240 255M Linux RAID
/dev/sda4 1048576 1953525134 1952476559 931G Linux RAID
Disk /dev/sdb: 931,5 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 63217472-3D2E-9444-917C-4776100B2D87
Device Start End Sectors Size Type
/dev/sdb1 2048 524287 522240 255M EFI System
/dev/sdb2 524288 526335 2048 1M BIOS boot
/dev/sdb3 526336 1048575 522240 255M Linux RAID
/dev/sdb4 1048576 1953525134 1952476559 931G Linux RAID没有位图:
# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md126 : active raid1 sda4[2] sdb4[0]
976106176 blocks super 1.2 [2/2] [UU]
md127 : active raid1 sda3[2] sdb3[0]
261056 blocks super 1.0 [2/2] [UU]
unused devices: <none>LVM是用32 MiB PE大小创建的,因此在它内部,所有内容都对齐到32 MiB。
lsblk --discard显示没有任何设备支持任何TRIM (甚至没有排队)。这可能是因为LSI2008芯片不知道这个命令。排队的TRIM在以下SSD:https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/ata/libata-core.c?id=9a9324d3969678d44b330e1230ad2c8ae67acf81上被列入黑名单。无论如何,这仍然是相同的Windows看到,所以比较是公平的。
I/O调度程序在两个磁盘上都是“无”的。我还尝试了"mq-deadline“(默认情况),它显示了更糟糕的结果。
在这种配置下,fio显示了以下结果:
PVEHost-128K-Q32T1-Seq-Read bw=515MiB/s (540MB/s), 515MiB/s-515MiB/s (540MB/s-540MB/s), io=97.5GiB (105GB), run=194047-194047msec
PVEHost-128K-Q32T1-Seq-Write bw=239MiB/s (250MB/s), 239MiB/s-239MiB/s (250MB/s-250MB/s), io=97.7GiB (105GB), run=419273-419273msec
PVEHost-4K-Q8T8-Rand-Read bw=265MiB/s (278MB/s), 265MiB/s-265MiB/s (278MB/s-278MB/s), io=799GiB (858GB), run=3089818-3089818msec
PVEHost-4K-Q8T8-Rand-Write bw=132MiB/s (138MB/s), 132MiB/s-132MiB/s (138MB/s-138MB/s), io=799GiB (858GB), run=6214084-6214084msec
PVEHost-4K-Q32T1-Rand-Read bw=265MiB/s (278MB/s), 265MiB/s-265MiB/s (278MB/s-278MB/s), io=98.7GiB (106GB), run=380721-380721msec
PVEHost-4K-Q32T1-Rand-Write bw=132MiB/s (139MB/s), 132MiB/s-132MiB/s (139MB/s-139MB/s), io=99.4GiB (107GB), run=768521-768521msec
PVEHost-4K-Q1T1-Rand-Read bw=16.8MiB/s (17.6MB/s), 16.8MiB/s-16.8MiB/s (17.6MB/s-17.6MB/s), io=99.9GiB (107GB), run=6102415-6102415msec
PVEHost-4K-Q1T1-Rand-Write bw=36.4MiB/s (38.1MB/s), 36.4MiB/s-36.4MiB/s (38.1MB/s-38.1MB/s), io=99.8GiB (107GB), run=2811085-2811085msec在完全相同的硬件配置上,Windows配置了逻辑磁盘管理器镜像。结果如下:
WS2019-128K-Q32T1-Seq-Read bw=1009MiB/s (1058MB/s), 1009MiB/s-1009MiB/s (1058MB/s-1058MB/s), io=100GiB (107GB), run=101535-101535msec
WS2019-128K-Q32T1-Seq-Write bw=473MiB/s (496MB/s), 473MiB/s-473MiB/s (496MB/s-496MB/s), io=97.8GiB (105GB), run=211768-211768msec
WS2019-4K-Q8T8-Rand-Read bw=265MiB/s (278MB/s), 265MiB/s-265MiB/s (278MB/s-278MB/s), io=799GiB (858GB), run=3088236-3088236msec
WS2019-4K-Q8T8-Rand-Write bw=130MiB/s (137MB/s), 130MiB/s-130MiB/s (137MB/s-137MB/s), io=799GiB (858GB), run=6272968-6272968msec
WS2019-4K-Q32T1-Rand-Read bw=189MiB/s (198MB/s), 189MiB/s-189MiB/s (198MB/s-198MB/s), io=99.1GiB (106GB), run=536262-536262msec
WS2019-4K-Q32T1-Rand-Write bw=124MiB/s (130MB/s), 124MiB/s-124MiB/s (130MB/s-130MB/s), io=99.4GiB (107GB), run=823544-823544msec
WS2019-4K-Q1T1-Rand-Read bw=22.9MiB/s (24.0MB/s), 22.9MiB/s-22.9MiB/s (24.0MB/s-24.0MB/s), io=99.9GiB (107GB), run=4466576-4466576msec
WS2019-4K-Q1T1-Rand-Write bw=41.4MiB/s (43.4MB/s), 41.4MiB/s-41.4MiB/s (43.4MB/s-43.4MB/s), io=99.8GiB (107GB), run=2466593-2466593msec比较:
windows none mq-deadline comment
1058MB/s 540MB/s 539MB/s 50% less than Windows, but this is expected
496MB/s 250MB/s 295MB/s 40-50% less than Windows!
278MB/s 278MB/s 278MB/s same as Windows
137MB/s 138MB/s 127MB/s almost same as Windows
198MB/s 278MB/s 276MB/s 40% more than Windows
130MB/s 139MB/s 130MB/s similar to Windows
24.0MB/s 17.6MB/s 17.3MB/s 26% less than Windows
43.4MB/s 38.1MB/s 28.3MB/s 12-34% less than Windows只有在至少有两个线程时,Linux RAID1才会从这两个驱动器中读取。第一个测试是单线程,因此Linux将从单个驱动器读取,并实现单个驱动器的性能。这是合理的,这第一个测试结果是好的。但其他人..。
这些只是主机测试。当我们比较在VM中运行相同测试时所做的事情时,最后一行显示的情况更糟:在PVE下的Windows (没有膨胀的固定内存、固定的CPU频率、virtio v171、带障碍的写回)中,它显示的显示比在Hyper-V下的Windows低70%。即使是PVE下的Linux显示的结果也比Hyper-V下的Windows糟糕得多:
windows, windows, linux,
hyper-v pve pve
128K-Q32T1-Seq-Read 1058MB/s 856MB/s 554MB/s
128K-Q32T1-Seq-Write 461MB/s 375MB/s 514MB/s
4K-Q8T8-Rand-Read 273MB/s 327MB/s 254MB/s
4K-Q8T8-Rand-Write 135MB/s 139MB/s 138MB/s
4K-Q32T1-Rand-Read 220MB/s 198MB/s 210MB/s
4K-Q32T1-Rand-Write 131MB/s 146MB/s 140MB/s
4K-Q1T1-Rand-Read 18.2MB/s 5452kB/s 8701kB/s
4K-Q1T1-Rand-Write 26.7MB/s 7772kB/s 10.7MB/s在这些测试过程中,尽管I/O负载很大,PVE下的Linux也是如此,但Hyper下的Windows还是很负责任的。但是当Windows运行在PVE下时,它的GUI爬行缓慢,RDP会话往往由于丢包而断开连接,主机上的HA高达48,这主要是由于大量的i/o等待!
在测试过程中,单个核上的负载相当大,这恰好为"megasas“中断提供了服务。这张卡只显示一个中断源,因此无法“在硬件中”传播。Windows在测试过程中没有显示出这样的单核负载,因此它似乎使用了某种中断控制(将其负载分散在内核上)。在Windows主机测试中,总体CPU负载低于Linux主机。然而,这是不能直接比较的。
问题是:为什么它这么糟糕,我是不是错过了什么?是否有可能拥有与Windows类似的性能?(我写这篇文章时握着手,字迹不清,与Windows相比,赶超实在令人不快。)
@shodanshok的加性试验表明:
[global]
ioengine=libaio
group_reporting
filename=/dev/vh0/testvol
direct=1
size=5G
[128K-Q1T32-Seq-Read]
rw=read
bs=128K
numjobs=32
stonewall
[128K-Q1T32-Seq-Write]
rw=write
bs=128K
numjobs=32
stonewall
[4K-Q1T32-Seq-Read]
rw=read
bs=4K
numjobs=32
stonewall
[4K-Q1T32-Seq-Write]
rw=write
bs=4K
numjobs=32
stonewall
[128K-Q1T2-Seq-Read]
rw=read
bs=128K
numjobs=2
stonewall
[128K-Q1T2-Seq-Write]
rw=write
bs=128K
numjobs=2
stonewall结果:
128K-Q1T32-Seq-Read bw=924MiB/s (969MB/s), 924MiB/s-924MiB/s (969MB/s-969MB/s), io=160GiB (172GB), run=177328-177328msec
128K-Q1T32-Seq-Write bw=441MiB/s (462MB/s), 441MiB/s-441MiB/s (462MB/s-462MB/s), io=160GiB (172GB), run=371784-371784msec
4K-Q1T32-Seq-Read bw=261MiB/s (274MB/s), 261MiB/s-261MiB/s (274MB/s-274MB/s), io=160GiB (172GB), run=627761-627761msec
4K-Q1T32-Seq-Write bw=132MiB/s (138MB/s), 132MiB/s-132MiB/s (138MB/s-138MB/s), io=160GiB (172GB), run=1240437-1240437msec
128K-Q1T2-Seq-Read bw=427MiB/s (448MB/s), 427MiB/s-427MiB/s (448MB/s-448MB/s), io=10.0GiB (10.7GB), run=23969-23969msec
128K-Q1T2-Seq-Write bw=455MiB/s (477MB/s), 455MiB/s-455MiB/s (477MB/s-477MB/s), io=10.0GiB (10.7GB), run=22498-22498msec事情很奇怪,为什么128 K-Q1T2-Seq-Read是如此糟糕?(理想值为1200 to /S)每项工作5 GiB太小,无法解决问题?其他一切似乎都很好。
发布于 2020-02-07 08:06:20
如果只使用两个SATA磁盘,您就会受到IRQ服务时间的限制,这是非常不喜欢的。相反,您看到的缓慢IO速度很可能是MegaRAID控制器禁用磁盘自己的私有DRAM缓存的直接结果,对于SSD来说,这些缓存对于获得良好性能至关重要。
如果您使用的是PERC品牌的MegaRAID卡,则可以通过omconfig storage vdisk controller=0 vdisk=0 diskcachepolicy=enabled启用磁盘的私有缓存(我是从内存中编写的,仅作为一个示例;请与这个omconfigCLI参考核对。
无论如何,一定要理解这意味着什么:如果在使用使用者(即:无电源保护) SSD时启用了磁盘缓存,任何电源中断都可能导致数据丢失。如果您承载关键数据,请不要启用磁盘缓存;而是购买企业级SSD,它带有功率丢失保护的写回缓存(例如: Intel S4510)。
如果,并且只有在数据是可消耗的情况下,那么可以随意启用磁盘的内部缓存。
更多参考:https://notesbytom.wordpress.com/2016/10/21/dell-perc-megaraid-disk-cache-policy/
https://serverfault.com/questions/1002138
复制相似问题