文章/答案/技术大牛

发布

社区首页 >问答首页 >关于“带注意力的神经机器翻译”的TensorFlow 2.0教程中的注意事项

问关于“带注意力的神经机器翻译”的TensorFlow 2.0教程中的注意事项
EN

Stack Overflow用户

提问于 2019-10-30 12:38:40

回答 1查看 340关注 0票数 0

当我学习示例"Neural machine translation with attention"时，有一个问题。

class Decoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
    super(Decoder, self).__init__()
    self.batch_sz = batch_sz
    self.dec_units = dec_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.dec_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')
    self.fc = tf.keras.layers.Dense(vocab_size)

    # used for attention
    self.attention = BahdanauAttention(self.dec_units)

  def call(self, x, hidden, enc_output):
    # enc_output shape == (batch_size, max_length, hidden_size)
    context_vector, attention_weights = self.attention(hidden, enc_output)

    # x shape after passing through embedding == (batch_size, 1, embedding_dim)
    x = self.embedding(x)

    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

    # passing the concatenated vector to the GRU
    output, state = self.gru(x)

    # output shape == (batch_size * 1, hidden_size)
    output = tf.reshape(output, (-1, output.shape[2]))

    # output shape == (batch_size, vocab)
    x = self.fc(output)

    return x, state, attention_weights

为什么注意力权重是由encoder_output和encoder_hidden计算的，上下文向量是与decoder_embedding联系的。在我看来，注意力权重应该由encoder_output和decoder_output的每个隐含元素来计算，上下文向量应该与decoder_output联系。

也许我还没有完全理解seq2seq的用心？

seq2seq

attention-model

tensorflow2.0

回答 1

Stack Overflow用户

发布于 2019-10-30 17:30:48

在解码器的每一步中都会引起注意。解码器步骤的输入是：

在编码器enc_output的隐藏状态( training)

previous hidden

hidden x )的同时隐藏解码器的隐藏状态时，先前解码的令牌(或地面真实令牌)被解码

正如你正确地说过的，注意到单个解码器的隐藏状态和所有编码器的隐藏状态作为输入，它给出了上下文向量。

context_vector, attention_weights = self.attention(hidden, enc_output)

当上下文向量被用作GRU单元的输入时，只有在调用注意力机制之后，它才会与嵌入连接起来。

x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
output, state = self.gru(x)

在解码器的下一步中，变量output将变为hidden。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58618837

复制

相似问题

问关于“带注意力的神经机器翻译”的TensorFlow 2.0教程中的注意事项
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问关于“带注意力的神经机器翻译”的TensorFlow 2.0教程中的注意事项EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问关于“带注意力的神经机器翻译”的TensorFlow 2.0教程中的注意事项
EN