费米的构架就更不要想了 https://docs.nvidia.com/nsight-compute/2020.3/ReleaseNotes/index.html ? 也有详细的文档 https://docs.nvidia.com/nsight-compute/2020.3/ProfilingGuide/index.html ?
(https://developer.nvidia.com/nsight-compute) 总结 矩阵-矩阵乘法是神经网络训练和推理中最常用的运算。矩阵乘法的次数几乎是神经网络层数的3n。
libpthread-stubs0-dev libthrust-dev libvdpau-dev libx11-dev libxau-dev libxcb1-dev libxdmcp-dev node-html5shiv nsight-compute libpthread-stubs0-dev libthrust-dev libvdpau-dev libx11-dev libxau-dev libxcb1-dev libxdmcp-dev node-html5shiv nsight-compute