首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >nnUNet启动训练时GPU内存不足

nnUNet启动训练时GPU内存不足
EN

Stack Overflow用户
提问于 2022-08-29 21:39:16
回答 1查看 125关注 0票数 0

这很奇怪,因为我已经使用nnUNet训练了几个模型,现在突然间,在它开始第一个时代之前,我就得到了这个错误。完全错误是:

""“

代码语言:javascript
复制
Traceback (most recent call last):
  File "/home/viberti/miniconda3/bin/nnUNet_train", line 8, in <module>
    sys.exit(main())
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/run/run_training.py", line 179, in main
    trainer.run_training()
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/training/network_training/nnUNetTrainerV2.py", line 440, in run_training
    ret = super().run_training()
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/training/network_training/nnUNetTrainer.py", line 317, in run_training
    super(nnUNetTrainer, self).run_training()
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/training/network_training/network_trainer.py", line 464, in run_training
    l = self.run_iteration(self.tr_gen, True)
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/training/network_training/nnUNetTrainerV2.py", line 247, in run_iteration
    output = self.network(data)
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/network_architecture/generic_UNet.py", line 391, in forward
    x = self.conv_blocks_context[d](x)
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/network_architecture/generic_UNet.py", line 142, in forward
    return self.blocks(x)
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/network_architecture/generic_UNet.py", line 68, in forward
    return self.lrelu(self.instnorm(x))
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/instancenorm.py", line 72, in forward
    return self._apply_instance_norm(input)
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/modules/instancenorm.py", line 32, in _apply_instance_norm
    return F.instance_norm(
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 2466, in instance_norm
    return torch.instance_norm(
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 15.78 GiB total capacity; 1.71 GiB already allocated; 6.50 MiB free; 1.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception in thread Thread-6:
Traceback (most recent call last):
  File "/home/viberti/miniconda3/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/home/viberti/miniconda3/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 92, in results_loop
    raise RuntimeError("Abort event was set. So someone died and we should end this madness. \nIMPORTANT: "
RuntimeError: Abort event was set. So someone died and we should end this madness. 
IMPORTANT: This is not the actual error message! Look further up to see what caused the error. Please also check whether your RAM was full
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/home/viberti/miniconda3/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/home/viberti/miniconda3/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/home/viberti/miniconda3/lib/python3.9/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 92, in results_loop
    raise RuntimeError("Abort event was set. So someone died and we should end this madness. \nIMPORTANT: "
RuntimeError: Abort event was set. So someone died and we should end this madness. 
IMPORTANT: This is not the actual error message! Look further up to see what caused the error. Please also check whether your RAM was full

""“

我在木星实验室工作,使用特斯拉V1 GPU。像许多类似情况下的sugget一样,减少批处理大小是没有意义的,因为nnUNet会自动对其进行调整。

EN

回答 1

Stack Overflow用户

发布于 2022-08-29 22:20:46

也许你应该试试

代码语言:javascript
复制
torch.cuda.empty_cache()

或重新装入笔记本

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73534950

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档