文章/答案/技术大牛

发布

问gprof与cachegrind配置文件
EN

Stack Overflow用户

提问于 2011-06-11 23:17:57

回答 2查看 5.5K关注 0票数 12

在尝试优化代码时，我对kcachegrdind和gprof生成的配置文件的差异感到有点困惑。具体地说，如果我使用gprof (使用-pg开关等进行编译)，我会得到：

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 89.62      3.71     3.71   204626     0.02     0.02  objR<true>::R_impl(std::vector<coords_t, std::allocator<coords_t> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&) const
  5.56      3.94     0.23 18018180     0.00     0.00  W2(coords_t const&, coords_t const&)
  3.87      4.10     0.16   200202     0.00     0.00  build_matrix(std::vector<coords_t, std::allocator<coords_t> > const&)
  0.24      4.11     0.01   400406     0.00     0.00  std::vector<double, std::allocator<double> >::vector(std::vector<double, std::allocator<double> > const&)
  0.24      4.12     0.01   100000     0.00     0.00  Wrat(std::vector<coords_t, std::allocator<coords_t> > const&, std::vector<coords_t, std::allocator<coords_t> > const&)
  0.24      4.13     0.01        9     1.11     1.11  std::vector<short, std::allocator<short> >* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<std::vector<short, std::alloca

这似乎表明我不需要费心去寻找除了::R_impl(...)之外的任何地方

同时，如果我不使用-pg开关进行编译，而是运行valgrind --tool=callgrind ./a.out，则会得到完全不同的结果:以下是kcachegrind输出的屏幕截图

如果我理解正确的话，这似乎表明::R_impl(...)只花了大约50%的时间，而另一半的时间花在线性代数(Wrat(...)，eigenvalues和底层的lapack调用)上，这在gprof配置文件中要低得多。

我知道gprof和cachegrind使用不同的技术，即使他们的结果有些不同，我也不会在意。但在这里，它看起来非常不同，我不知道如何解释它们。有什么想法或建议吗？

c++

optimization

profiling

valgrind

gprof

回答 2

Stack Overflow用户

回答已采纳

发布于 2011-08-06 01:51:17

您看错了列。您必须查看kcachegrind输出中的第二列，即名为"self“的列。这是特定子例程仅在不考虑其子例程的情况下花费的时间。第一列有累积时间(它等于main机器时间的100% )，它的信息量不是很大(在我看来)。

注意，从kcachegrind的输出中，您可以看到进程的总时间是53.64秒，而在子例程"R_impl“中花费的时间是46.72秒，占总时间的87%。所以gprof和kcachegrind几乎完全一致。

票数 14

Stack Overflow用户

发布于 2011-06-11 23:49:22

gprof是仪表化剖析器，callgrind是采样剖析器。使用带指令的分析器，您可以获得每个函数进入和退出的开销，这可能会扭曲分析，特别是如果您有相对较小的函数，这些函数被多次调用。采样分析器往往更准确-它们会稍微减慢整个程序的执行速度，但这往往会对所有函数产生相同的相对影响。

试试Zoom from RotateRight的30天免费评估--我怀疑它会给你一个更符合callgrind而不是gprof的个人资料。

票数 9

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/6316697

复制

相似问题

问gprof与cachegrind配置文件
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问gprof与cachegrind配置文件EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问gprof与cachegrind配置文件
EN