文章/答案/技术大牛

发布

社区首页 >问答首页 >Seq2Seq模型的权重

问Seq2Seq模型的权重
EN

Stack Overflow用户

提问于 2016-03-23 16:34:07

回答 2查看 911关注 0票数 0

我看了一遍代码，恐怕我没有抓住一个重要的要点。

我似乎找不到编码器和解码器的模型的权重矩阵，也找不到它们在哪里更新。我找到了target_weights，但它似乎在每次get_batch()调用时都会被重新初始化，所以我也不太明白它们代表什么。

我的实际目标是将两个源编码器的两个隐藏状态连接到一个解码器，方法是应用一个线性变换和一个权重矩阵，我将不得不与模型一起训练(我正在构建一个多对一模型)，但由于上面提到的问题，我不知道从哪里开始。

tensorflow

sequence

回答 2

Stack Overflow用户

发布于 2016-03-24 00:04:50

这可能会帮助你开始。在tensorflow.python.ops.seq2seq.py中实现了几个模型(有/没有存储桶、注意力等)但看看embedding_attention_seq2seq的定义(这是您似乎引用的示例模型seq2seq_model.py中调用的定义)：

def embedding_attention_seq2seq(encoder_inputs, decoder_inputs, cell,
                                num_encoder_symbols, num_decoder_symbols,
                                num_heads=1, output_projection=None,
                                feed_previous=False, dtype=dtypes.float32,
                                scope=None, initial_state_attention=False):

  with variable_scope.variable_scope(scope or "embedding_attention_seq2seq"):
    # Encoder.
    encoder_cell = rnn_cell.EmbeddingWrapper(cell, num_encoder_symbols)
    encoder_outputs, encoder_state = rnn.rnn(
        encoder_cell, encoder_inputs, dtype=dtype)

    # First calculate a concatenation of encoder outputs to put attention on.
    top_states = [array_ops.reshape(e, [-1, 1, cell.output_size])
                  for e in encoder_outputs]
    attention_states = array_ops.concat(1, top_states)
    ....

您可以看到，在将编码器输出传递给解码器之前，它将编码器输出的顶层选作top_states。

因此，您可以使用两个编码器实现类似的功能，并在将这些状态传递给解码器之前将这些状态连接起来。

票数 1

Stack Overflow用户

发布于 2017-06-28 19:40:30

在get_batch函数中创建的值仅用于第一次迭代。即使权重每次都传递到函数中，它们的值也会在初始化函数的Seq2Seq模型类中更新为全局变量。

    with tf.name_scope('Optimizer'):
        # Gradients and SGD update operation for training the model.
        params = tf.trainable_variables()
        if not forward_only:
            self.gradient_norms = []
            self.updates = []
            opt = tf.train.GradientDescentOptimizer(self.learning_rate)
            for b in range(len(buckets)):
                gradients = tf.gradients(self.losses[b], params)
                clipped_gradients, norm = tf.clip_by_global_norm(gradients,
                                                                 max_gradient_norm)
                self.gradient_norms.append(norm)
                self.updates.append(opt.apply_gradients(
                    zip(clipped_gradients, params), global_step=self.global_step))

    self.saver = tf.train.Saver(tf.global_variables())

权重作为占位符单独馈送，因为它们在get_batch函数中被标准化，以便为填充输入创建零权重。

    # Batch decoder inputs are re-indexed decoder_inputs, we create weights.
    for length_idx in range(decoder_size):
        batch_decoder_inputs.append(
            np.array([decoder_inputs[batch_idx][length_idx]
                      for batch_idx in range(self.batch_size)], dtype=np.int32))

        # Create target_weights to be 0 for targets that are padding.
        batch_weight = np.ones(self.batch_size, dtype=np.float32)
        for batch_idx in range(self.batch_size):
            # We set weight to 0 if the corresponding target is a PAD symbol.
            # The corresponding target is decoder_input shifted by 1 forward.
            if length_idx < decoder_size - 1:
                target = decoder_inputs[batch_idx][length_idx + 1]
            if length_idx == decoder_size - 1 or target == data_utils.PAD_ID:
                batch_weight[batch_idx] = 0.0
        batch_weights.append(batch_weight)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/36173205

复制

相似问题

问Seq2Seq模型的权重
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Seq2Seq模型的权重EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Seq2Seq模型的权重
EN