我在特斯拉K80上运行Tensorflow 0.8,使用CUDA 7.5和CUDNN v5。一切都很好,但两台设备不能互相访问。
警告日志列在下面。谢谢。
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 0 to device ordinal 2
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 0 to device ordinal 3
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 1 to device ordinal 2
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 1 to device ordinal 3
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 2 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 2 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 3 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 3 to device ordinal 1发布于 2016-05-31 17:39:54
我敢打赌,你肯定有一些像这样的多套接字配置:

如果每个K80都没有共享相同的PCIe根复合体。然后,允许从GPU0到GPU1的对等访问,但不允许从GPU0到GPU2 2/GPU2 3访问。
Tensorflow应该能够检测到这种系统,并在GPU之间执行手动副本。
https://stackoverflow.com/questions/37550136
复制相似问题