文章/答案/技术大牛

发布

社区首页 >问答首页 >MultiHeadAttention中填充顺序的影响(TensorFlow/Keras)

问MultiHeadAttention中填充顺序的影响(TensorFlow/Keras)
EN

Stack Overflow用户

提问于 2020-11-27 21:47:10

回答 2查看 597关注 0票数 1

我正在尝试使用MultiHeadAttention层来处理variable-length元素集，即顺序不重要的序列(否则我会尝试RNNs)。问题是我不确定我是否理解了输入序列中填充的效果。我的观点是，包含元素1和2的序列的输出应该等于具有给定长度的0填充的相同序列的输出。换句话说，输入[1, 2]和[1, 2, 0] (甚至[1, 2, 0, 0, 0 ... ]应该产生与真实输入相同的输出(1和2，我不介意0的输出，因为我知道它是pad的“假”输入)。下面是一段代码，它根据填充显示了不同的输出。

import tensorflow as tf
import numpy as np

max_tokens = 10  # maximum length of any sequence
dimension = 5  # dimension of the vectors in the embedding

# Variable-length int sequences.
query_input = tf.keras.layers.Input(shape=(None,), dtype='int32')
value_input = tf.keras.layers.Input(shape=(None,), dtype='int32')

handmade_embedding = np.arange(max_tokens).reshape(max_tokens, 1) * np.ones(dimension)

# Embedding lookup.
token_embedding = tf.keras.layers.Embedding(input_dim=max_tokens, output_dim=dimension, mask_zero=True,
                                            embeddings_initializer=tf.constant_initializer(handmade_embedding),
                                            trainable=False)

# Query embeddings of shape [batch_size, Tq, dimension].
query_embeddings = token_embedding(query_input)
# Value embeddings of shape [batch_size, Tv, dimension].
value_embeddings = token_embedding(value_input)

attention_output, weights = \
    tf.keras.layers.MultiHeadAttention(num_heads=10, key_dim=10)(query=query_embeddings,
                                                                 value=value_embeddings,
                                                                 return_attention_scores=True)

model = tf.keras.Model(inputs=[query_input, value_input],
                       outputs=[query_embeddings, attention_output])
names = ('query_embeddings', 'attention_output')

model.summary()

q = np.array([[1, 2, 0]])
prediction = model.predict([q, q])  # self-attention

print('\nWITH PADDING')
for n, v in zip(names, prediction):
    print(f'\n{n}:\n{v}')

q = q[:, :-1]  # remove the padding column in this example
prediction = model.predict([q, q])  # self-attention
print('\nWITHOUT PADDING')
for n, v in zip(names, prediction):
    print(f'\n{n}:\n{v}')

带填充的MultiHeadAttention层的输出如下所示：

attention_output:
[[[-0.0374077  -0.03303239 -0.02354158 -0.04111823  0.08189851]
  [-0.04877335 -0.04348412 -0.012391   -0.04778382  0.09745573]
  [-0.02586985 -0.02244503 -0.03482261 -0.03429744  0.06620502]]]

并且没有填充：

attention_output:
[[[-0.04313684 -0.03764199 -0.04799934 -0.05400878  0.10519686]
  [-0.04743624 -0.041591   -0.04378954 -0.05654225  0.11106053]]]

我预计第一个和第二个输出向量是相同的，但事实并非如此。我计划稍后处理这些向量，并将它们总结为一个向量(平均值或其他值)，但我希望得到有关填充长度的确定性输出。我误解了什么？

tensorflow

keras

padding

masking

attention-model

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-07-09 17:23:58

好吧，在让代码在我的电脑上休息了几个月后，现在似乎甚至不需要attention_mask了。现在的输出就是我所期望的，也就是说，真实条目的输出是相同的。也许TensorFlow中的一些内部变化会影响到这一点。我都快疯了..。

票数 0

Stack Overflow用户

发布于 2020-11-27 22:41:46

您必须在Multihead_Attention()调用中添加attention_mask argument。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65038445

复制

相似问题

问MultiHeadAttention中填充顺序的影响(TensorFlow/Keras)
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问MultiHeadAttention中填充顺序的影响(TensorFlow/Keras)EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问MultiHeadAttention中填充顺序的影响(TensorFlow/Keras)
EN