以下代码每晚在pytorch (1.5.0.dev20200206)上运行都没有错误,但是一旦我安装了稳定的1.5build,下面定义的RNN转发方法就开始抛出一个错误:
def forward(self, sequence):
print('Sequence shape:', sequence.shape)
sequence = sequence.clone().view(len(sequence), 1, -1)
print("flattened shape: ", sequence.shape)
lstm_out, hidden = self.lstm(
sequence, self.hidden
)
print(lstm_out.shape)
out_space = self.hidden2out(lstm_out[:, -1])
self.hidden = hidden
print("hiddens")
print(hidden[0].shape)
print(hidden[1].shape)
print(" out_space: ", out_space.shape)
out_scores = torch.sigmoid(out_space)
print("out_scores: ", out_scores.shape)
out = out_scores.squeeze()
print(out.shape)
return out我添加了clone()函数,以防止view()对内存进行就地修改,并使变量赋值明显不到位。然而,我仍然得到以下错误:
Sequence shape: torch.Size([200, 19, 62])
flattened shape: torch.Size([200, 1, 1178])
torch.Size([200, 1, 8])
hiddens
torch.Size([1, 1, 8])
torch.Size([1, 1, 8])
out_space: torch.Size([200, 1])
out_scores: torch.Size([200, 1])
torch.Size([200])
Warning: Error detected in AddmmBackward. Traceback of forward call that caused the error:
File "main.py", line 240, in <module>
main_loop(args)
File "main.py", line 115, in main_loop
train.run(args)
File "/data/learnedbloomfilter/python/classifier/train.py", line 519, in run
args.log_every,
File "/data/learnedbloomfilter/python/classifier/train.py", line 88, in train
predictions = model(features)
File "/data/miniconda3/envs/lbf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/data/learnedbloomfilter/python/classifier/embedding_lstm.py", line 65, in forward
sequence, self.hidden
File "/data/miniconda3/envs/lbf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/data/miniconda3/envs/lbf/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 570, in forward
self.dropout, self.training, self.bidirectional, self.batch_first)
(print_stack at /opt/conda/conda-bld/pytorch_1587428190859/work/torch/csrc/autograd/python_anomaly_mode.cpp:60)
Traceback (most recent call last):
File "main.py", line 240, in <module>
main_loop(args)
File "main.py", line 115, in main_loop
train.run(args)
File "/data/learnedbloomfilter/python/classifier/train.py", line 519, in run
args.log_every,
File "/data/learnedbloomfilter/python/classifier/train.py", line 97, in train
loss.backward(retain_graph=True)
File "/data/miniconda3/envs/lbf/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/data/miniconda3/envs/lbf/lib/python3.7/site-packages/torch/autograd/__init__.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [8, 32]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace
further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!我已经将错误隔离到forward(),但是找不到似乎导致问题的中间张量[torch.FloatTensor [8, 32]] (我的forward方法中的张量形状都不匹配,所以它一定在lstm forward()方法中)。我只用CPU,不用cuda。
有关rnn代码的其余部分,请参阅以下要点:https://gist.github.com/yaatehr/aac21cae05b24101f2369c97cfecb47b
谢谢!
发布于 2020-05-10 13:19:23
发布模型的完整代码。
该错误意味着在某个点上,跟踪渐变的变量被修改了。就地操作是任何带有_的pytorch操作(即torch.add与torch.add_。它也可以在某个时刻重新分配一个变量。
https://stackoverflow.com/questions/61700779
复制相似问题