当我学习示例"Neural machine translation with attention"时,有一个问题。
class Decoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
super(Decoder, self).__init__()
self.batch_sz = batch_sz
self.dec_units = dec_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.dec_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
# used for attention
self.attention = BahdanauAttention(self.dec_units)
def call(self, x, hidden, enc_output):
# enc_output shape == (batch_size, max_length, hidden_size)
context_vector, attention_weights = self.attention(hidden, enc_output)
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# output shape == (batch_size * 1, hidden_size)
output = tf.reshape(output, (-1, output.shape[2]))
# output shape == (batch_size, vocab)
x = self.fc(output)
return x, state, attention_weights为什么注意力权重是由encoder_output和encoder_hidden计算的,上下文向量是与decoder_embedding联系的。在我看来,注意力权重应该由encoder_output和decoder_output的每个隐含元素来计算,上下文向量应该与decoder_output联系。
也许我还没有完全理解seq2seq的用心?
发布于 2019-10-30 17:30:48
在解码器的每一步中都会引起注意。解码器步骤的输入是:
在编码器enc_output的隐藏状态( training)
hidden
x )的同时隐藏解码器的隐藏状态时,先前解码的令牌(或地面真实令牌)被解码正如你正确地说过的,注意到单个解码器的隐藏状态和所有编码器的隐藏状态作为输入,它给出了上下文向量。
context_vector, attention_weights = self.attention(hidden, enc_output)当上下文向量被用作GRU单元的输入时,只有在调用注意力机制之后,它才会与嵌入连接起来。
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
output, state = self.gru(x)在解码器的下一步中,变量output将变为hidden。
https://stackoverflow.com/questions/58618837
复制相似问题