如何修改Tensorflow Sequence2序列模型以实现双向LSTM而不是单向LSTM？_Tensorflow_Nlp_Lstm_Sequence To Sequence_Attention Model

如何修改Tensorflow Sequence2序列模型以实现双向LSTM而不是单向LSTM？

tensorflow nlp

如何修改Tensorflow Sequence2序列模型以实现双向LSTM而不是单向LSTM？,tensorflow,nlp,lstm,sequence-to-sequence,attention-model,Tensorflow,Nlp,Lstm,Sequence To Sequence,Attention Model,请参阅本文了解问题的背景：我在同一个模型上工作，希望用双向层替换单向LSTM层。我意识到我必须使用静态的而不是静态的，但是由于张量形状的不匹配，我得到了一个错误我替换了以下行： encoder_outputs, encoder_state = core_rnn.static_rnn(encoder_cell, encoder_inputs, dtype=dtype) 用下面的一行： encoder_outputs, encoder_state_fw, encoder_state_bw =

请参阅本文了解问题的背景：

我在同一个模型上工作，希望用双向层替换单向LSTM层。我意识到我必须使用静态的而不是静态的，但是由于张量形状的不匹配，我得到了一个错误

我替换了以下行：

encoder_outputs, encoder_state = core_rnn.static_rnn(encoder_cell, encoder_inputs, dtype=dtype)

用下面的一行：

encoder_outputs, encoder_state_fw, encoder_state_bw = core_rnn.static_bidirectional_rnn(encoder_cell, encoder_cell, encoder_inputs, dtype=dtype)

这给了我以下错误：

InvalidArgumentError（回溯见上文）：不兼容的形状： [32,5,1256]对[16,1,1256] [[Node:gradients/model_with_bucket/Embedded_attention_seq2seq/Embedded_attention_decoder/attention_decoder/attention_0/add_grad/Broadcasting GradientArgs =BroadcastGradientArgs[T=DT_INT32，_device=“/job:localhost/replica:0/task:0/cpu:0”]（渐变/带桶的模型/嵌入关注度\u seq 2seq/嵌入关注度\u解码器/关注度\u解码器/关注度\u 0/添加关注度/形状，渐变/模型_带_桶/嵌入_注意_seq2seq/嵌入_注意_解码器/注意_解码器/注意_0/添加_梯度/形状_1）]]

我知道这两种方法的输出是不同的，但我不知道如何修改注意代码来合并它。如何将前向状态和后向状态发送到注意模块？是否连接两个隐藏状态？

我从错误消息中发现，某个位置的两个张量的批量大小不匹配，一个是32，另一个是16。我想这是因为双向rnn的输出列表的大小是单向rnn的两倍。在下面的代码中，您只是没有相应地进行调整

如何将前向状态和后向状态发送给注意点模块-是否连接两个隐藏状态

您可以参考以下代码：

  def _reduce_states(self, fw_st, bw_st):
    """Add to the graph a linear layer to reduce the encoder's final FW and BW state into a single initial state for the decoder. This is needed because the encoder is bidirectional but the decoder is not.
    Args:
      fw_st: LSTMStateTuple with hidden_dim units.
      bw_st: LSTMStateTuple with hidden_dim units.
    Returns:
      state: LSTMStateTuple with hidden_dim units.
    """
    hidden_dim = self._hps.hidden_dim
    with tf.variable_scope('reduce_final_st'):

      # Define weights and biases to reduce the cell and reduce the state
      w_reduce_c = tf.get_variable('w_reduce_c', [hidden_dim * 2, hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
      w_reduce_h = tf.get_variable('w_reduce_h', [hidden_dim * 2, hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
      bias_reduce_c = tf.get_variable('bias_reduce_c', [hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)
      bias_reduce_h = tf.get_variable('bias_reduce_h', [hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init)

      # Apply linear layer
      old_c = tf.concat(axis=1, values=[fw_st.c, bw_st.c]) # Concatenation of fw and bw cell
      old_h = tf.concat(axis=1, values=[fw_st.h, bw_st.h]) # Concatenation of fw and bw state
      new_c = tf.nn.relu(tf.matmul(old_c, w_reduce_c) + bias_reduce_c) # Get new cell from old cell
      new_h = tf.nn.relu(tf.matmul(old_h, w_reduce_h) + bias_reduce_h) # Get new state from old state
return tf.contrib.rnn.LSTMStateTuple(new_c, new_h) # Return new cell and state

这似乎就是我要找的。让我试试看，如果可以的话就更新一下。谢谢。这似乎有效，但我有一个问题：为什么我不能简单地将解码单元的大小增加一倍，而不是将编码单元的状态投影到一半？我知道这将减少模型中的参数数量，但我不会因为我正在进行的投影而丢失信息吗？@LeenaShekhar将解码单元大小加倍也是可行的。在这里，您最好将双向编码器的两种状态合并为一种状态（使编码器和解码器具有相同的单元大小以避免出错），这是通过对c和h分别执行上述投影来完成的。