我已经发现了我的服务器上的内存速度和延迟问题。我从OVH租用这台服务器,在更换RAM之前,他们的要求之一是
要启动干预,需要在票证中发送日志,显示标识符和受影响的RAM模块。
我如何能够检测到DRAM芯片的故障,因为这是一个大型(1TBRAM)生产服务器,几天不运行memtest86+。
sysbench -测试=内存-内存块大小=4G-内存总量=32G运行
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Running memory speed test with the following options:
block size: 4194304KiB
total size: 32768MiB
operation: write
scope: global
Initializing worker threads...
Threads started!
Total operations: 2 ( 0.15 per second)
8192.00 MiB transferred (630.16 MiB/sec)
General statistics:
total time: 12.9937s
total number of events: 2
Latency (ms):
min: 6338.94
avg: 6496.29
max: 6653.64
95th percentile: 6594.16
sum: 12992.58
Threads fairness:
events (avg/stddev): 2.0000/0.00
execution time (avg/stddev): 12.9926/0.00sysbench
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Running memory speed test with the following options:
block size: 1KiB
total size: 102400MiB
operation: write
scope: global
Initializing worker threads...
Threads started!
Total operations: 48693603 (4868132.10 per second)
47552.35 MiB transferred (4754.04 MiB/sec)
General statistics:
total time: 10.0002s
total number of events: 48693603
Latency (ms):
min: 0.00
avg: 0.00
max: 0.47
95th percentile: 0.00
sum: 4155.88
Threads fairness:
events (avg/stddev): 48693603.0000/0.00
execution time (avg/stddev): 4.1559/0.00数独-short -C存储器
H/W path Device Class Description
==========================================================
/0/0 memory 64KiB BIOS
/0/20 memory 1TiB System Memory
/0/20/0 memory [empty]
/0/20/1 memory 128GiB DIMM DDR4 Synchronous LRDIMM 2933 MHz (0.3 ns)
/0/20/2 memory [empty]
/0/20/3 memory 128GiB DIMM DDR4 Synchronous LRDIMM 2933 MHz (0.3 ns)
/0/20/4 memory [empty]
/0/20/5 memory 128GiB DIMM DDR4 Synchronous LRDIMM 2933 MHz (0.3 ns)
/0/20/6 memory [empty]
/0/20/7 memory 128GiB DIMM DDR4 Synchronous LRDIMM 2933 MHz (0.3 ns)
/0/20/8 memory [empty]
/0/20/9 memory 128GiB DIMM DDR4 Synchronous LRDIMM 2933 MHz (0.3 ns)
/0/20/a memory [empty]
/0/20/b memory 128GiB DIMM DDR4 Synchronous LRDIMM 2933 MHz (0.3 ns)
/0/20/c memory [empty]
/0/20/d memory 128GiB DIMM DDR4 Synchronous LRDIMM 2933 MHz (0.3 ns)
/0/20/e memory [empty]
/0/20/f memory 128GiB DIMM DDR4 Synchronous LRDIMM 2933 MHz (0.3 ns)
/0/23 memory 3MiB L1 cache
/0/24 memory 24MiB L2 cache
/0/25 memory 256MiB L3 cache发布于 2021-07-20 21:46:10
这是一个长篇大论,而不是一个回答。也许它会刺激其他的建议。其基本思想是保留和分配内存,这样sysbench就必须为其内存块缓冲区使用不同的内存区域来观察所有内存的性能是否相同。
测试系统在4X8G的DIMM上只有32G的内存。
首先,只需正常运行sysbench,但增加足够的测试时间,以便在执行时获得一些信息:
doug@s19:~/c$ sysbench --test=memory --memory-block-size=4G --memory-total-size=512G --num-threads=1 --time=60 run
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Running memory speed test with the following options:
block size: 4194304KiB
total size: 524288MiB
operation: write
scope: global
Initializing worker threads...
Threads started!
Total operations: 128 ( 3.05 per second)
524288.00 MiB transferred (12509.51 MiB/sec)
General statistics:
total time: 41.9101s
total number of events: 128
Latency (ms):
min: 327.22
avg: 327.42
max: 328.09
95th percentile: 325.98
sum: 41909.72
Threads fairness:
events (avg/stddev): 128.0000/0.00
execution time (avg/stddev): 41.9097/0.00在执行测试所需的41秒内,在另一个终端中,我这样做了:
doug@s19:~$ ps aux | grep sysbench
doug 10489 92.2 12.8 4227496 4204436 pts/1 Sl+ 09:48 0:03 sysbench --test=memory --memory-block-size=4G --memory-total-size=512G --num-threads=1 --time=60 run
doug 10492 0.0 0.0 9040 732 pts/2 S+ 09:48 0:00 grep --color=auto sysbench
doug@s19:~$ pmap 10489 | grep anon | grep " 4195"
00007f25cbb5a000 4195456K rw--- [ anon ]我做了几次整个循环:
00007fa4bd7b4000 4195456K rw--- [ anon ]
00007f32c9878000 4195456K rw--- [ anon ]
00007f4837211000 4195456K rw--- [ anon ]这表明我们真的不知道使用了什么内存,也不知道返回到DIMM的映射。我们还是继续吧。
编辑:通过禁用地址空间布局随机化(ASLR),上述内容可以重复使用:
sudo sysctl kernel.randomize_va_space=0现在,预留一些内存块,以便将sysbench缓冲区分配到其他位置。也许有人有更好的主意,但我写了一个程序:
doug@s19:~/c$ cat reservem.c
/*****************************************************************************
*
* reservem.c 2021.07.20 Smythies
* allocate a chunk of memory for a while.
* current use is to force another program to use a different area
* of memory.
* see also: https://askubuntu.com/questions/1352756/ram-has-become-very-slow-low-write-speed-and-high-latency
* see slao testm.c from which this code started.
*
*****************************************************************************/
#include
#include
#include
int main(){
char *fptr;
long i, k;
/* Adjust as needed for use requirements */
i = 8589934592;
if(( fptr = (char *)malloc(i)) == NULL){
printf("reservem: memory allocation failed, Exiting...\n");
exit(-1);
}
while (( fptr == NULL) && (i > 0));
for(k = 0; k < i; k++){ /* so that the memory really gets allocated and not just reserved */
fptr[k] = (char) (k & 255);
} /* endfor */
printf("reservem: memory reserved and allocated. now sleeping...\n");
sleep(180); /* so other tests and observation can be done. Adjust as required. */
free(fptr);
printf("reservem: memory has been set free. Done and exiting...\n");
return(0);
} /* endprogram */汇编如下:
doug@s19:~/c$ cc reservem.c -o reservem然后运行它,然后在内存块被保留和分配时重新执行前面的步骤。得到:
doug@s19:~$ ps aux | grep sysbench
doug 11324 93.8 12.8 4227496 4204340 pts/1 Sl+ 13:58 0:09 sysbench --test=memory --memory-block-size=4G --memory-total-size=512G --num-threads=1 --time=60 run
doug 11327 0.0 0.0 9040 664 pts/2 S+ 13:59 0:00 grep --color=auto sysbench
doug@s19:~$ pmap 11324 | grep anon | grep " 4195"
00007fc5f2df8000 4195456K rw--- [ anon ]为预留的内存:
doug@s19:~$ ps aux | grep reservem
doug 11314 55.0 25.6 8391108 8389584 pts/0 S+ 13:57 0:11 ./reservem
doug 11318 0.0 0.0 9040 740 pts/2 S+ 13:57 0:00 grep --color=auto reservem
doug@s19:~$ pmap 11314 | grep anon | grep " 8388612K"
00007f11a6bfc000 8388612K rw--- [ anon ]和:
524288.00 MiB transferred (12499.79 MiB/sec)同样,对于2和3乘8G的保留和分配:
doug@s19:~$ ps aux | grep reservem
doug 11335 85.0 25.6 8391108 8389704 pts/0 S 14:07 0:11 ./reservem
doug 11336 92.0 25.6 8391108 8389672 pts/0 S 14:07 0:11 ./reservem
doug 11340 0.0 0.0 9040 736 pts/2 S+ 14:08 0:00 grep --color=auto reservem
doug@s19:~$ pmap 11335 | grep anon | grep " 8388612K"
00007fae2ce3f000 8388612K rw--- [ anon ]
doug@s19:~$ pmap 11336 | grep anon | grep " 8388612K"
00007f20cb627000 8388612K rw--- [ anon ]
doug@s19:~$ ps aux | grep sysbench
doug 11347 96.6 12.8 4227496 4204468 pts/1 Sl+ 14:08 0:12 sysbench --test=memory --memory-block-size=4G --memory-total-size=512G --num-threads=1 --time=60 run
doug 11350 0.0 0.0 9040 740 pts/2 S+ 14:08 0:00 grep --color=auto sysbench
doug@s19:~$ pmap 11347 | grep anon | grep " 4195"
00007f37dbe3c000 4195456K rw--- [ anon ]以及:
524288.00 MiB transferred (12521.74 MiB/sec)3X:
doug@s19:~$ ps aux | grep reservem
doug 11388 100 21.0 8391108 6889064 pts/0 R 14:12 0:09 ./reservem
doug 11389 103 19.2 8391108 6292368 pts/0 R 14:12 0:08 ./reservem
doug 11390 100 16.3 8391108 5334328 pts/0 R 14:12 0:07 ./reservem
doug 11392 0.0 0.0 9040 724 pts/2 S+ 14:12 0:00 grep --color=auto reservem
doug@s19:~$ pmap 11388 | grep anon | grep " 8388612K"
00007f2b83d2d000 8388612K rw--- [ anon ]
doug@s19:~$ pmap 11389 | grep anon | grep " 8388612K"
00007f2921e0c000 8388612K rw--- [ anon ]
doug@s19:~$ pmap 11390 | grep anon | grep " 8388612K"
00007f2a23f2b000 8388612K rw--- [ anon ]
doug@s19:~$ ps aux | grep sysbench
doug 11402 107 12.8 4227496 4204420 pts/1 Sl+ 14:12 0:07 sysbench --test=memory --memory-block-size=4G --memory-total-size=512G --num-threads=1 --time=60 run
doug 11405 0.0 0.0 9040 672 pts/2 S+ 14:12 0:00 grep --color=auto sysbench
doug@s19:~$ pmap 11402 | grep anon | grep " 4195"
00007fe64b54b000 4195456K rw--- [ anon ]以及:
524288.00 MiB transferred (12504.34 MiB/sec)作为参考,此系统上的性能将显示内存块大小:
Block-size: performance (MiB/sec):
256 2750.99
512 4937.70
1K 8216.82
2K 12290.93
4K 16334.92
8K 19498.37
16K 21663.68
32K 22514.94
64K 23372.45
128K 23815.14
256K 23967.98
512K 24126.43
1M 24226.70
2M 24279.93
4M 24310.19
8M 23632.07
16M 20622.04
32M 16149.71
64M 14206.06
128M 13303.15
256M 12853.12
512M 12720.34
1G 12584.68
2G 12538.66
4G 12502.66
8G 12490.63
16G 12482.25https://askubuntu.com/questions/1352756
复制相似问题