文章/答案/技术大牛

发布

社区首页 >问答首页 >使用缓存和调用的不同读写计数

问使用缓存和调用的不同读写计数
EN

Stack Overflow用户

提问于 2013-04-03 14:44:24

回答 2查看 1.3K关注 0票数 14

我正在做一些实验，与卡什研，卡莱尔和Gem5。我注意到，许多访问被认为是用于缓存研磨机的读，为呼叫研磨的写，以及gem5的读和写。

让我们举一个非常简单的例子：

int main() {
    int i, l;

    for (i = 0; i < 1000; i++) {
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        ... (100 times)
     }
 }

我编制的资料如下：

gcc ex.c -静态-o ex

因此，基本上，根据asm文件，addl $1, -8(%rbp)被执行了100,000次。因为它既是读也是写，所以我期待着100 k读和100 k写。然而，缓存只将它们计算为读，而调用则只计算为写。

 % valgrind --tool=cachegrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15356== Cachegrind, a cache and branch-prediction profiler
==15356== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote et al.
==15356== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15356== Command: ./ex
==15356== 
--15356-- warning: L3 cache found, using its data for the LL simulation.
==15356== 
==15356== I   refs:      111,535
==15356== I1  misses:        475
==15356== LLi misses:        280
==15356== I1  miss rate:    0.42%
==15356== LLi miss rate:    0.25%
==15356== 
==15356== D   refs:      104,894  (103,791 rd   + 1,103 wr)
==15356== D1  misses:        557  (    414 rd   +   143 wr)
==15356== LLd misses:        172  (     89 rd   +    83 wr)
==15356== D1  miss rate:     0.5% (    0.3%     +  12.9%  )
==15356== LLd miss rate:     0.1% (    0.0%     +   7.5%  )
==15356== 
==15356== LL refs:         1,032  (    889 rd   +   143 wr)
==15356== LL misses:         452  (    369 rd   +    83 wr)
==15356== LL miss rate:      0.2% (    0.1%     +   7.5%  )

 % valgrind --tool=callgrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15376== Callgrind, a call-graph generating cache profiler
==15376== Copyright (C) 2002-2012, and GNU GPL'd, by Josef Weidendorfer et al.
==15376== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15376== Command: ./ex
==15376== 
--15376-- warning: L3 cache found, using its data for the LL simulation.
==15376== For interactive control, run 'callgrind_control -h'.
==15376== 
==15376== Events    : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
==15376== Collected : 111532 2777 102117 474 406 151 279 87 85
==15376== 
==15376== I   refs:      111,532
==15376== I1  misses:        474
==15376== LLi misses:        279
==15376== I1  miss rate:    0.42%
==15376== LLi miss rate:    0.25%
==15376== 
==15376== D   refs:      104,894  (2,777 rd + 102,117 wr)
==15376== D1  misses:        557  (  406 rd +     151 wr)
==15376== LLd misses:        172  (   87 rd +      85 wr)
==15376== D1  miss rate:     0.5% ( 14.6%   +     0.1%  )
==15376== LLd miss rate:     0.1% (  3.1%   +     0.0%  )
==15376== 
==15376== LL refs:         1,031  (  880 rd +     151 wr)
==15376== LL misses:         451  (  366 rd +      85 wr)
==15376== LL miss rate:      0.2% (  0.3%   +     0.0%  )

有人能给我一个合理的解释吗？我是否正确地认为，实际上有~100 k读和~100 k写(即一个addl的两个高速缓存访问)？

assembly

callgrind

cachegrind

gem5

回答 2

Stack Overflow用户

发布于 2013-05-21 03:33:42

来自缓存研磨手册: 5.7.1。缓存仿真规范

修改存储器位置(例如，inc和dec)的指令被计算为只进行读取，即单个数据引用。这可能看起来很奇怪，但由于写入永远不会导致遗漏(读取可以保证块在缓存中)，所以并不是很有趣。因此，它测量的不是数据缓存被访问的次数，而是可能发生数据缓存丢失的次数。

看样子，Call砂子的缓存模拟逻辑与cache差事不同。我会认为，愈伤组织应该产生与缓存研磨相同的结果，所以也许这是一个错误？

票数 3

Stack Overflow用户

发布于 2013-04-22 23:07:44

默认情况下，Call研磨不完全模拟缓存。见此处：http://valgrind.org/docs/manual/cl-manual.html#cl-manual.options.cachesimulation

要启用数据读取访问，您需要添加-cache-sim=yes，以便进行回调。话虽如此，为什么还要在这段代码上使用回调呢？没有任何一个函数调用(这就是回调的目的)。

票数 -1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/15790541

复制

相似问题

问使用缓存和调用的不同读写计数
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用缓存和调用的不同读写计数EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用缓存和调用的不同读写计数
EN