我正在做一些实验,与卡什研,卡莱尔和Gem5。我注意到,许多访问被认为是用于缓存研磨机的读,为呼叫研磨的写,以及gem5的读和写。
让我们举一个非常简单的例子:
int main() {
int i, l;
for (i = 0; i < 1000; i++) {
l++;
l++;
l++;
l++;
l++;
l++;
l++;
l++;
l++;
l++;
... (100 times)
}
}我编制的资料如下:
gcc ex.c -静态-o ex
因此,基本上,根据asm文件,addl $1, -8(%rbp)被执行了100,000次。因为它既是读也是写,所以我期待着100 k读和100 k写。然而,缓存只将它们计算为读,而调用则只计算为写。
% valgrind --tool=cachegrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15356== Cachegrind, a cache and branch-prediction profiler
==15356== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote et al.
==15356== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15356== Command: ./ex
==15356==
--15356-- warning: L3 cache found, using its data for the LL simulation.
==15356==
==15356== I refs: 111,535
==15356== I1 misses: 475
==15356== LLi misses: 280
==15356== I1 miss rate: 0.42%
==15356== LLi miss rate: 0.25%
==15356==
==15356== D refs: 104,894 (103,791 rd + 1,103 wr)
==15356== D1 misses: 557 ( 414 rd + 143 wr)
==15356== LLd misses: 172 ( 89 rd + 83 wr)
==15356== D1 miss rate: 0.5% ( 0.3% + 12.9% )
==15356== LLd miss rate: 0.1% ( 0.0% + 7.5% )
==15356==
==15356== LL refs: 1,032 ( 889 rd + 143 wr)
==15356== LL misses: 452 ( 369 rd + 83 wr)
==15356== LL miss rate: 0.2% ( 0.1% + 7.5% )-
% valgrind --tool=callgrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15376== Callgrind, a call-graph generating cache profiler
==15376== Copyright (C) 2002-2012, and GNU GPL'd, by Josef Weidendorfer et al.
==15376== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15376== Command: ./ex
==15376==
--15376-- warning: L3 cache found, using its data for the LL simulation.
==15376== For interactive control, run 'callgrind_control -h'.
==15376==
==15376== Events : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
==15376== Collected : 111532 2777 102117 474 406 151 279 87 85
==15376==
==15376== I refs: 111,532
==15376== I1 misses: 474
==15376== LLi misses: 279
==15376== I1 miss rate: 0.42%
==15376== LLi miss rate: 0.25%
==15376==
==15376== D refs: 104,894 (2,777 rd + 102,117 wr)
==15376== D1 misses: 557 ( 406 rd + 151 wr)
==15376== LLd misses: 172 ( 87 rd + 85 wr)
==15376== D1 miss rate: 0.5% ( 14.6% + 0.1% )
==15376== LLd miss rate: 0.1% ( 3.1% + 0.0% )
==15376==
==15376== LL refs: 1,031 ( 880 rd + 151 wr)
==15376== LL misses: 451 ( 366 rd + 85 wr)
==15376== LL miss rate: 0.2% ( 0.3% + 0.0% )有人能给我一个合理的解释吗?我是否正确地认为,实际上有~100 k读和~100 k写(即一个addl的两个高速缓存访问)?
发布于 2013-05-21 03:33:42
来自缓存研磨手册: 5.7.1。缓存仿真规范
看样子,Call砂子的缓存模拟逻辑与cache差事不同。我会认为,愈伤组织应该产生与缓存研磨相同的结果,所以也许这是一个错误?
发布于 2013-04-22 23:07:44
默认情况下,Call研磨不完全模拟缓存。见此处:http://valgrind.org/docs/manual/cl-manual.html#cl-manual.options.cachesimulation
要启用数据读取访问,您需要添加-cache-sim=yes,以便进行回调。话虽如此,为什么还要在这段代码上使用回调呢?没有任何一个函数调用(这就是回调的目的)。
https://stackoverflow.com/questions/15790541
复制相似问题