文章/答案/技术大牛

发布

问gprof不提供输出
EN

Stack Overflow用户

提问于 2017-10-08 13:48:10

回答 3查看 4.9K关注 0票数 6

我试图使用gprof来分析我正在开发的一些数值代码，但gprof似乎无法从我的程序中收集数据。下面是我的命令行：

g++ -Wall -O3 -g -pg -o fftw_test fftw_test.cpp -lfftw3 -lfftw3_threads -lm && ./fftw_test

gmon.out文件已创建，但似乎没有数据。当我跑的时候

gprof -b fftw_test gmon.out > gprof.out

我得到的只是

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    


                        Call graph


granularity: each sample hit covers 2 byte(s) no time propagated

index % time    self  children    called     name


Index by function name

有什么见解吗？

代码做了很多事情，它不是简单地调用FFTW例程。它有计算某些复系数的函数，将输入数据乘以这些系数的函数，等等。

编辑:包括示例代码和结果。

#include <cstdlib>
#include <ctime>

int main()
{
   std::srand( std::time( 0 ) );

   double sum = 0.0;

   for ( int i = 0; i < RAND_MAX; ++i )
      sum += std::rand() / ( double ) RAND_MAX;

   std::cout << sum << '\n';

   return 0;
}

命令行：

$ g++ -Wall -O3 -g -pg -o gprof_test gprof_test.cpp && ./gprof_test
1.07374e+09
$ gprof -b gprof_test gmon.out > gprof.out
$ cat gprof.out

结果：

Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    


                        Call graph


granularity: each sample hit covers 2 byte(s) no time propagated

index % time    self  children    called     name


Index by function name

就是这样。

gprof

g++

profiling

回答 3

Stack Overflow用户

发布于 2018-08-07 09:14:52

如果你使用的是Debian6，你很可能会遇到Debian (注意，这个bug并不是特定于this的，而是取决于gcc是如何构建的)。一种解决方法是简单地使用“-no- to”选项进行编译，该选项禁用与位置无关的代码生成。

如果你想了解更多关于派的知识，This是一个很好的开始。

票数 6

Stack Overflow用户

发布于 2022-01-05 19:31:55

我认为问题来自于你正在使用O3级别的优化。使用gcc-8.4.0，我对O3一无所获，使用O2和O1和O0的有限数据(例如，缺少函数调用的数量)和适当的配置文件。

这似乎是较老版本的gcc的known bug，但我在较新版本中没有遇到任何关于这样的问题的来源。我只能假设这是编译器的错误，还是更激进的优化阻止了一些性能数据的收集。

票数 1

Stack Overflow用户

发布于 2017-10-26 00:37:09

gprof似乎无法从我的程序中收集数据。下面是我的命令行：

g++ -Wall -O3 -g -pg -o fftw_test fftw_test.cpp -lfftw3 -lfftw3_threads -lm && ./fftw_test

您的程序使用fftw库，并且可能几乎只包含fftw库调用。运行时间是多少？您的程序可能太快，无法使用gprof进行性能分析。gprof可能看不到更新和库，因为它是在未启用gprof性能分析的情况下编译的。

GNU gprof有两个部分。首先，它检测c/cpp文件中的函数调用，这些文件是使用-pg选项(使用mcount函数调用- https://en.wikipedia.org/wiki/Gprof)编译的，以获取调用者/被调用者的信息。其次，它将额外的性能分析库链接到您的可执行文件中，以添加定期采样，以找出哪些代码执行的时间更长。使用配置文件(setitimer)进行采样。Setitimer分析的分辨率有限，无法解析小于10毫秒或1毫秒(每秒100或1000个样本)的间隔。

在您的示例中，fftw库可能是在没有插入指令的情况下编译的，因此其中没有mcount调用。它仍然可以通过采样部分捕获，但只能用于程序的主线程(https://en.wikipedia.org/wiki/Gprof -“通常它只分析应用程序的主线程”)。

perf分析器没有mcount指令插入(当使用-g选项记录时，它从堆栈展开中获得被调用者/调用者)，但它有更好的统计/采样变体(它可以使用硬件PMU计数器)，没有100或1000 Hz限制，并且它正确地支持(分析)线程。尝试使用perf record -F1000 ./fftw_test (采样频率为1 kHz )和perf report或perf report > report.txt。也有一些图形用户界面/超文本标记语言前端可以使用：https://github.com/KDAB/hotspot https://github.com/jrfonseca/gprof2dot

要获得更好的setitimer样式分析器，请查看google-perftools https://github.com/gperftools/gperftools中的"CPU PROFILER“。

======

通过你的测试，我得到了一些关于Debian8.6Linux内核版本3.16.0-4-amd64机，g++ (Debian4.9.2-10)的gprof结果，gprof是"GNU gprof (GNU Binutils for Debian) 2.27“。

$ cat gprof_test.cpp
#include <cstdlib>
#include <ctime>
#include <iostream>
int main()
{
   std::srand( std::time( 0 ) );
   double sum = 0.0;
   for ( int i = 0; i < 100000000; ++i )
      sum += std::rand() / ( double ) RAND_MAX;
   std::cout << sum << '\n';
   return 0;
}
$ g++ -Wall -O3 -g -pg -o gprof_test gprof_test.cpp && time ./gprof_test
5.00069e+06
real    0m0.992s
$ gprof -b gprof_test gmon.out
Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name
  0.00      0.00     0.00        1     0.00     0.00  _GLOBAL__sub_I_main

因此，gprof在这1秒的示例中没有捕获任何时间样本，也没有关于库中调用的信息(它们是在没有gprof的情况下编译的在添加了一些包装器函数并禁止内联优化之后，我从gprof获得了一些数据，但是库时间没有被计算在内(它看到0.72秒的2秒运行时间)：

$ cat *cpp
#include <cstdlib>
#include <ctime>
#include <iostream>

int rand_wrapper1()
{
  return std::rand();
}
int rand_scale1()
{
  return rand_wrapper1() / ( double ) RAND_MAX;
}
int main()
{
   std::srand( std::time( 0 ) );
   double sum = 0.0;
   for ( int i = 0; i < 100000000; ++i )
    sum+= rand_scale1();
//      sum += std::rand() / ( double ) RAND_MAX;
   std::cout << sum << '\n';
   return 0;
}
$ g++ -Wall -O3 -fno-inline -g -pg -o gprof_test gprof_test.cpp && time ./gprof_test
real    0m2.345s
$ gprof -b gprof_test gmon.out
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ns/call  ns/call  name
 80.02      0.57     0.57                             rand_scale1()
 19.29      0.71     0.14 100000000     1.37     1.37  rand_wrapper1()
  2.14      0.72     0.02                             frame_dummy
  0.00      0.72     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z13rand_wrapper1v
  0.00      0.72     0.00        1     0.00     0.00  __static_initialization_and_destruction_0(int, int) [clone .constprop.0]


                        Call graph


granularity: each sample hit covers 2 byte(s) for 1.39% of 0.72 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]     97.9    0.57    0.14                 rand_scale1() [1]
                0.14    0.00 100000000/100000000     rand_wrapper1() [2]
-----------------------------------------------
                0.14    0.00 100000000/100000000     rand_scale1() [1]
[2]     19.0    0.14    0.00 100000000         rand_wrapper1() [2]

perf可以看到所有部分：

$ perf record ./gprof_test
0
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.388 MB perf.data (~16954 samples) ]
$ perf report |more   
# Samples: 9K of event 'cycles'
# Event count (approx.): 7373484231
#
# Overhead     Command      Shared Object                     Symbol
# ........  ..........  .................  .........................
#
    25.91%  gprof_test  gprof_test         [.] rand_scale1()
    21.65%  gprof_test  libc-2.19.so       [.] __mcount_internal
    13.88%  gprof_test  libc-2.19.so       [.] _mcount
    12.54%  gprof_test  gprof_test         [.] main
     9.35%  gprof_test  libc-2.19.so       [.] __random_r
     8.40%  gprof_test  libc-2.19.so       [.] __random
     3.97%  gprof_test  gprof_test         [.] rand_wrapper1()
     2.79%  gprof_test  libc-2.19.so       [.] rand
     1.41%  gprof_test  gprof_test         [.] mcount@plt
     0.03%  gprof_test  [kernel.kallsyms]  [k] memset

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/46627934

复制

相似问题

问gprof不提供输出
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问gprof不提供输出EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问gprof不提供输出
EN