有没有办法在nvprof中获得内核执行时间作为一个指标?
例如,要获取dram读取事务,我输入:
nvprof --metrics dram_read_transactions ./myprogram我的问题是:有没有这样的东西
nvprof --metrics execution_time ./myprogram我希望在一个命令行中收集一小部分指标,而不必使用
nvprof ./myprogram作为单独的命令。
发布于 2018-10-05 20:57:58
我相信您正在寻找: nvprof --print-gpu-trace./myprogram
发布于 2018-10-05 21:03:25
你应该在nVIDIA的"CUDA Pro Tip“博客上阅读这篇文章:
CUDA Pro Tip: nvprof is Your Handy Universal GPU Profiler
它向您介绍了如何使用nvprof来分析和计算应用程序的一些基础知识。具体地说,如果您编写的代码如下:
nvprof --print-gpu-trace ./nbody --benchmark -numdevices=2 -i=1(该示例适用于n-body物理问题模拟器),则输出将包括类似以下内容:
...
==4125== Profiling application: ./nbody --benchmark -numdevices=2 -i=1
==4125== Profiling result:
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput Device Context Stream Name
260.78ms 864ns - - - - - 4B 4.6296MB/s Tesla K20c (0) 2 2 [CUDA memcpy HtoD]
260.79ms 960ns - - - - - 4B 4.1667MB/s GeForce GTX 680 1 2 [CUDA memcpy HtoD]
260.93ms 896ns - - - - - 4B 4.4643MB/s Tesla K20c (0) 2 2 [CUDA memcpy HtoD]
260.94ms 672ns - - - - - 4B 5.9524MB/s GeForce GTX 680 1 2 [CUDA memcpy HtoD]
268.03ms 1.3120us - - - - - 8B 6.0976MB/s Tesla K20c (0) 2 2 [CUDA memcpy HtoD]
268.04ms 928ns - - - - - 8B 8.6207MB/s GeForce GTX 680 1 2 [CUDA memcpy HtoD]
268.19ms 864ns - - - - - 8B 9.2593MB/s Tesla K20c (0) 2 2 [CUDA memcpy HtoD]
268.19ms 800ns - - - - - 8B 10.000MB/s GeForce GTX 680 1 2 [CUDA memcpy HtoD]
274.59ms 2.2887ms (52 1 1) (256 1 1) 36 0B 4.0960KB - - Tesla K20c (0) 2 2 void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [242]
274.67ms 981.47us (32 1 1) (256 1 1) 36 0B 4.0960KB - - GeForce GTX 680 1 2 void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [257]
276.94ms 2.3146ms (52 1 1) (256 1 1) 36 0B 4.0960KB - - Tesla K20c (0) 2 2 void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [275]
276.99ms 979.36us (32 1 1) (256 1 1) 36 0B 4.0960KB - - GeForce GTX 680 1 2 void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [290]这是你所有内核的计时。
运行nvprof --help并花5-10分钟阅读选项也很有用;例如,如果您想在脚本中处理跟踪,您可以找到以CSV格式打印跟踪的开关。
https://stackoverflow.com/questions/52472188
复制相似问题