根据Nvidia官方文件,如果CUDA应用程序构建为包含PTX,因为PTX是向前兼容的,这意味着PTX支持在任何计算能力高于生成PTX的计算能力的GPU上运行。所以我试着找出torch-1.7.0+cu101是否是用PTX编译成二进制文件的,而事实似乎是用nvcc编译标志"-gencode=arch=compute_xx,code=sm_xx“吡喃CMakeLists.txt.I编译的,认为这个标志意味着编译后的产品包含PTX。但是,当我尝试在a 100中使用pytorch1.7和cuda10.1时,总是会出现错误。
>>> import torch
>>> torch.zeros(1).cuda()
/data/miniconda3/lib/python3.7/site-packages/torch/cuda/__init__.py:104: UserWarning:
A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the A100-SXM4-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 179, in __repr__
return torch._tensor_str._str(self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 372, in _str
return _str_intern(self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 352, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 241, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/_tensor_str.py", line 89, in __init__
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: no kernel image is available for execution on the device所以,我真的很想知道,为什么"PTX兼容价格“不适用于火把。还有其他的答案只能告诉我们使用cuda11或更高版本,而且我知道,works.But,他们没有告诉我真正的原因--为什么cuda10.1的py手电筒对A100不起作用。我尝试在工具箱中使用cuda10.1示例,这些小型演示应用程序可以正常工作。
[Matrix Multiply Using CUDA] - Starting...
MapSMtoCores for SM 8.0 is undefined. Default to use 64 Cores/SM
GPU Device 0: "A100-SXM4-40GB" with compute capability 8.0
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 4286.91 GFlop/s, Time= 0.031 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.如果有人能给我答复的话,我将非常感激
发布于 2022-03-04 03:10:47
在@talonmies提醒之后,我也在discuss.pytorch.org上发布了同样的问题。
答案是因为pytorch1.7使用cuDNN7,这与A100不兼容。Nvidia安培架构不支持CuDN7.6.5。安培唯一支持的cuDNN版本是cuDNN 8或更高版本。
https://stackoverflow.com/questions/71333076
复制相似问题