文章/答案/技术大牛

发布

社区首页 >问答首页 >同时使用past和attention_mask进行gpt2

问同时使用past和attention_mask进行gpt2
EN

Stack Overflow用户

提问于 2020-02-28 21:11:27

回答 1查看 606关注 0票数 0

我正在处理一批长度不同的句子，因此我计划利用gpt2中的填充+gpt2功能。

同时，对于每个句子，我需要添加一个后缀短语，并运行N个不同的推论。例如，根据“我喜欢喝可乐”这句话，我可能需要两个不同的推论：“我喜欢喝可乐。可乐是好的”和“我喜欢喝可乐。喝的是好的”。因此，我试图通过使用“过去”功能：https://huggingface.co/transformers/quickstart.html#using-the-past来缩短推理时间，所以我只处理原来的句子。(“我喜欢喝可乐”)一次，然后我以某种方式扩展了结果，使之能够与另外两句话一起使用：“可乐是好的”和“饮料是好的”。

下面，您将看到一个简单的代码，它试图表示我是如何尝试这样做的。为了简单起见，我只是在每个句子中添加一个后缀短语(...but，我仍然希望我最初的想法是可能的)：

from transformers.tokenization_gpt2 import GPT2Tokenizer
from transformers.modeling_gpt2 import GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained('gpt2', pad_token='<|endoftext|>')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Complete phrases are: "I like to drink soda without sugar" and "Go watch TV alone, I am not going"
docs = ["I like to drink soda", "Go watch TV"]
docs_tensors = tokenizer.batch_encode_plus(
    [d for d in docs], pad_to_max_length=True, return_tensors='pt')

docs_next = ["without sugar", "alone, I am not going"]
docs_next_tensors = tokenizer.batch_encode_plus(
    [d for d in docs_next], pad_to_max_length=True, return_tensors='pt')

# predicting the first part of each phrase
_, past = model(docs_tensors['input_ids'], attention_mask=docs_tensors['attention_mask'])

# predicting the rest of the phrase
logits, _ = model(docs_next_tensors['input_ids'], attention_mask=docs_next_tensors['attention_mask'], past=past)
logits = logits[:, -1]
_, top_indices_results = logits.topk(30)

我得到的错误如下：

Traceback (most recent call last):
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevd.py", line 1434, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/damiox/Workspace/xxLtd/yy/stress-test-withpast2.py", line 26, in <module>
    logits, _ = model(docs_next_tensors['input_ids'], attention_mask=docs_next_tensors['attention_mask'], past=past)
  File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/transformers/modeling_gpt2.py", line 593, in forward
    inputs_embeds=inputs_embeds,
  File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/transformers/modeling_gpt2.py", line 476, in forward
    hidden_states, layer_past=layer_past, attention_mask=attention_mask, head_mask=head_mask[i]
  File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/transformers/modeling_gpt2.py", line 226, in forward
    self.ln_1(x), layer_past=layer_past, attention_mask=attention_mask, head_mask=head_mask
  File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/transformers/modeling_gpt2.py", line 189, in forward
    attn_outputs = self._attn(query, key, value, attention_mask, head_mask)
  File "/Users/damiox/.local/share/virtualenvs/yy-uMxmjV2h/lib/python3.7/site-packages/transformers/modeling_gpt2.py", line 150, in _attn
    w = w + attention_mask
RuntimeError: The size of tensor a (11) must match the size of tensor b (6) at non-singleton dimension 3

Process finished with exit code 1

最初我认为这与https://github.com/huggingface/transformers/issues/3031有关，所以我重新构建了最新的大师来尝试修复，但我仍然经历了这个问题。

pytorch

huggingface-transformers

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-03-01 22:35:41

为了使当前的代码片段正常工作，您需要将前面的注意掩码和新的注意掩码组合如下：

from transformers.tokenization_gpt2 import GPT2Tokenizer
from transformers.modeling_gpt2 import GPT2LMHeadModel
import torch

tokenizer = GPT2Tokenizer.from_pretrained('gpt2', pad_token='<|endoftext|>')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Complete phrases are: "I like to drink soda without sugar" and "Go watch TV alone, I am not going"
docs = ["I like to drink soda", "Go watch TV"]
docs_tensors = tokenizer.batch_encode_plus(
    [d for d in docs], pad_to_max_length=True, return_tensors='pt')

docs_next = ["without sugar", "alone, I am not going"]
docs_next_tensors = tokenizer.batch_encode_plus(
    [d for d in docs_next], pad_to_max_length=True, return_tensors='pt')

# predicting the first part of each phrase
_, past = model(docs_tensors['input_ids'], attention_mask=docs_tensors['attention_mask'])

# predicting the rest of the phrase
attn_mask = torch.cat([docs_tensors['attention_mask'], docs_next_tensors['attention_mask']], dim=-1)
logits, _ = model(docs_next_tensors['input_ids'], attention_mask=attn_mask, past=past)
logits = logits[:, -1]
_, top_indices_results = logits.topk(30)

如果您想测试一个句子开始时的两个可能后缀，那么您可能必须复制您过去的变量，就像您有后缀一样。这意味着前缀input_ids的批处理大小必须与后缀input_ids的批处理大小相匹配，才能使其工作。

此外，如果您的前缀之一input_ids填充了位置编码，则必须更改后缀input_ids的位置编码输入(GPT2使用绝对位置编码)(这在上面的代码中没有显示--请看一下https://github.com/huggingface/transformers/issues/3021是如何完成的)。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60459292

复制

相似问题

问同时使用past和attention_mask进行gpt2
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问同时使用past和attention_mask进行gpt2EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问同时使用past和attention_mask进行gpt2
EN