我有一个CNN当我运行它,我得到了这个错误:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[60000,32,393,2] and type float on /job:localhost/replica:0/task:0/device:GPU:1 by allocator GPU_1_bfc
[[{{node sequential/conv2d_2/Relu}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[StatefulPartitionedCall/_31]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[60000,32,393,2] and type float on /job:localhost/replica:0/task:0/device:GPU:1 by allocator GPU_1_bfc
[[{{node sequential/conv2d_2/Relu}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations.
0 derived errors ignored.
srun: error: gpu01: task 0: Exited with exit code 1应该做些什么来解决这个问题呢?
发布于 2022-06-23 08:19:01
简单地说,这个形状[60000,32,393,2]对你的GPU来说太大了,你需要降低这个形状,减少批处理大小或者减少图像尺寸。
这意味着在训练期间,网络需要为卷积分配空间,但是它没有记忆。
另外,你需要改变CNN的结构,减少层中内核的数量。
在这里,我将尝试降低培训的批处理大小(60000)。
https://stackoverflow.com/questions/72723475
复制相似问题