这很奇怪,因为我已经使用nnUNet训练了几个模型,现在突然间,在它开始第一个时代之前,我就得到了这个错误。完全错误是:
""“
Traceback (most recent call last):
File "/home/viberti/miniconda3/bin/nnUNet_train", line 8, in <module>
sys.exit(main())
File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/run/run_training.py", line 179, in main
trainer.run_training()
File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/training/network_training/nnUNetTrainerV2.py", line 440, in run_training
ret = super().run_training()
File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/training/network_training/nnUNetTrainer.py", line 317, in run_training
super(nnUNetTrainer, self).run_training()
File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/training/network_training/network_trainer.py", line 464, in run_training
l = self.run_iteration(self.tr_gen, True)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/training/network_training/nnUNetTrainerV2.py", line 247, in run_iteration
output = self.network(data)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/network_architecture/generic_UNet.py", line 391, in forward
x = self.conv_blocks_context[d](x)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/network_architecture/generic_UNet.py", line 142, in forward
return self.blocks(x)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/network_architecture/generic_UNet.py", line 68, in forward
return self.lrelu(self.instnorm(x))
File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/instancenorm.py", line 72, in forward
return self._apply_instance_norm(input)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/instancenorm.py", line 32, in _apply_instance_norm
return F.instance_norm(
File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 2466, in instance_norm
return torch.instance_norm(
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 15.78 GiB total capacity; 1.71 GiB already allocated; 6.50 MiB free; 1.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception in thread Thread-6:
Traceback (most recent call last):
File "/home/viberti/miniconda3/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/home/viberti/miniconda3/lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 92, in results_loop
raise RuntimeError("Abort event was set. So someone died and we should end this madness. \nIMPORTANT: "
RuntimeError: Abort event was set. So someone died and we should end this madness.
IMPORTANT: This is not the actual error message! Look further up to see what caused the error. Please also check whether your RAM was full
Exception in thread Thread-5:
Traceback (most recent call last):
File "/home/viberti/miniconda3/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/home/viberti/miniconda3/lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 92, in results_loop
raise RuntimeError("Abort event was set. So someone died and we should end this madness. \nIMPORTANT: "
RuntimeError: Abort event was set. So someone died and we should end this madness.
IMPORTANT: This is not the actual error message! Look further up to see what caused the error. Please also check whether your RAM was full""“
我在木星实验室工作,使用特斯拉V1 GPU。像许多类似情况下的sugget一样,减少批处理大小是没有意义的,因为nnUNet会自动对其进行调整。
发布于 2022-08-29 22:20:46
也许你应该试试
torch.cuda.empty_cache()或重新装入笔记本
https://stackoverflow.com/questions/73534950
复制相似问题