我跟随这个guide在Google Colab TPU上启动了我的PyTorch Lightning项目。所以我安装了
!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl然后
!pip install pytorch-lightning然后我
!pip install torch torchvision torchaudio
!pip install -r requirements.txt在安装项目需求之后,我根据请求重新启动了运行时,并从上面重新运行cloud-TPU-client install、pytorch-lightning install和这两个命令。它运行得很顺利。
但就在TPU使用version PyTorch version 1.9启动后,我收到以下导入错误:
WARNING:root:TPU has started up successfully with version pytorch-1.9
Traceback (most recent call last):
File "synthesizer_train.py", line 2, in <module>
from synthesizer.train import train
File "/content/Real-Time-Voice-Cloning/synthesizer/train.py", line 6, in <module>
from synthesizer.models.tacotron import Tacotron
File "/content/Real-Time-Voice-Cloning/synthesizer/models/tacotron.py", line 7, in <module>
import pytorch_lightning as pl
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/__init__.py", line 20, in <module>
from pytorch_lightning.callbacks import Callback # noqa: E402
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/__init__.py", line 14, in <module>
from pytorch_lightning.callbacks.base import Callback
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/base.py", line 26, in <module>
from pytorch_lightning.utilities.types import STEP_OUTPUT
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/__init__.py", line 18, in <module>
from pytorch_lightning.utilities.apply_func import move_data_to_device # noqa: F401
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/apply_func.py", line 26, in <module>
from pytorch_lightning.utilities.imports import _compare_version, _TORCHTEXT_AVAILABLE
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/imports.py", line 101, in <module>
from pytorch_lightning.utilities.xla_device import XLADeviceUtils # noqa: E402
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/xla_device.py", line 24, in <module>
import torch_xla.core.xla_model as xm
File "/usr/local/lib/python3.7/dist-packages/torch_xla/__init__.py", line 142, in <module>
import _XLAC
ImportError: /usr/local/lib/python3.7/dist-packages/_XLAC.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at13_foreach_erf_EN3c108ArrayRefINS_6TensorEEE使用TPU_cores=8标志启动Trainer。
该模型事先已在CPU和GPU上运行(即在另一个会话上)。
我试图将PyTorch降级到1.9 (与TPU启动时显示的相同),因为Colab使用torch 1.10.0+cu111,但出现了一个不同的错误:
WARNING:root:TPU has started up successfully with version pytorch-1.9
Traceback (most recent call last):
File "synthesizer_train.py", line 2, in <module>
from synthesizer.train import train
File "/content/Real-Time-Voice-Cloning/synthesizer/train.py", line 6, in <module>
from synthesizer.models.tacotron import Tacotron
File "/content/Real-Time-Voice-Cloning/synthesizer/models/tacotron.py", line 7, in <module>
import pytorch_lightning as pl
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/__init__.py", line 20, in <module>
from pytorch_lightning.callbacks import Callback # noqa: E402
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/__init__.py", line 14, in <module>
from pytorch_lightning.callbacks.base import Callback
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/base.py", line 26, in <module>
from pytorch_lightning.utilities.types import STEP_OUTPUT
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/__init__.py", line 18, in <module>
from pytorch_lightning.utilities.apply_func import move_data_to_device # noqa: F401
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/apply_func.py", line 29, in <module>
if _compare_version("torchtext", operator.ge, "0.9.0"):
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/imports.py", line 54, in _compare_version
pkg = importlib.import_module(package)
File "/usr/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/usr/local/lib/python3.7/dist-packages/torchtext/__init__.py", line 5, in <module>
from . import vocab
File "/usr/local/lib/python3.7/dist-packages/torchtext/vocab/__init__.py", line 11, in <module>
from .vocab_factory import (
File "/usr/local/lib/python3.7/dist-packages/torchtext/vocab/vocab_factory.py", line 4, in <module>
from torchtext._torchtext import (
ImportError: /usr/local/lib/python3.7/dist-packages/torchtext/_torchtext.so: undefined symbol: _ZTVN5torch3jit6MethodE我能做些什么来训练TPU上的模型?
非常感谢
发布于 2021-11-29 13:01:49
实际上,同样的问题也已经被描述过了,suggested solution确实为我工作。
因此,在细节中,他们建议在安装torch_xla之后将PyTorch降级为1.9.0+cu111 (注意+cu111)。
因此,以下是我在使用TPU的Google Colab上启动Lightning项目的步骤:
!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl
!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchtext==0.10.0 -f https://download.pytorch.org/whl/cu111/torch_stable.html然后是项目的pip:
!pip install torch torchvision torchaudio pytorch-lightning
!pip install -r requirements.txt即使在这最后一步之后,我不得不重新启动运行时,它仍然起作用了。
https://stackoverflow.com/questions/70136356
复制相似问题