Tensorflow 如何在LSTM训练和解码中使用DropOutRapper_Tensorflow

Tensorflow 如何在LSTM训练和解码中使用DropOutRapper

tensorflow

Tensorflow 如何在LSTM训练和解码中使用DropOutRapper,tensorflow,Tensorflow,当我为lstm使用辍学机制时，无辍学模型的rouge分数和损失比有辍学模型的表现更好。所以我想知道我的辍学代码是否正确？我使用tensorflow 0.12 cellClass = tf.nn.rnn_cell.LSTMCell for layer_i in xrange(hps.enc_layers): with tf.variable_scope('encoder%d'%layer_i), tf.device( self._next_device()):

当我为lstm使用辍学机制时，无辍学模型的rouge分数和损失比有辍学模型的表现更好。所以我想知道我的辍学代码是否正确？我使用tensorflow 0.12

  cellClass = tf.nn.rnn_cell.LSTMCell
  for layer_i in xrange(hps.enc_layers):
    with tf.variable_scope('encoder%d'%layer_i), tf.device(
        self._next_device()):
      #bidirectional rnn cell
      cell_fw = cellClass(
          hps.num_hidden
          ,initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=123),
          state_is_tuple=False
      )
      cell_bw = cellClass(
          hps.num_hidden
          ,initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=113),
          state_is_tuple=False
      )
      cell_fw = tf.nn.rnn_cell.DropoutWrapper(cell_fw, input_keep_prob=hps.input_dropout, output_keep_prob=hps.output_dropout)
      cell_bw = tf.nn.rnn_cell.DropoutWrapper(cell_bw, input_keep_prob=hps.input_dropout, output_keep_prob=hps.output_dropout)
      (emb_encoder_inputs, fw_state, _) = tf.nn.bidirectional_rnn(
          cell_fw, cell_bw, emb_encoder_inputs, dtype=tf.float32,
          sequence_length=article_lens)
    #decoder
    cell = cellClass(
        hps.num_hidden
        ,initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=113),
        state_is_tuple=False
        )
    cell=tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=hps.input_dropout, output_keep_prob=hps.output_dropout)
    decoder_outputs, self._dec_out_state, self.cur_attns, self.cur_alpha = seq2seq.attention_decoder(
        emb_decoder_inputs, self._dec_in_state,  self._enc_top_states,
        cell, num_heads=1, loop_function=loop_function,
        initial_state_attention=initial_state_attention)

当训练时，我将那些keep prob设置为我使用的值，比如0.5，当计算训练集和验证集的损失时，我将它们设置为0.5，但在解码步骤中，我使用1，它没有丢失任何内容。我说得对吗？

差不多

计算精度和验证时，需要手动将keep_概率设置为1.0，以便在评估网络时不会实际降低任何权重值。如果你不这样做，你基本上会误判你训练你的人际网络到目前为止预测的价值。这肯定会对你的acc/val分数产生负面影响。尤其是辍学率高达50%

解码步骤中使用的退出层是可选的，应该进行试验。如果您确实使用了它，则需要将其设置为1.0以外的其他值

为了让路人重温一下，辍学背后的想法是重置整个网络权重的随机权重值，以增加神经元不会错误固定（或任何您喜欢的术语）的概率，从而导致网络的过度拟合。请记住，一般来说，我们试图将我们的网络近似或拟合为一个函数。由于拟合网络本质上是一个优化问题，我们不得不担心优化到局部最小值（或最大值…取决于图片在你脑海中的方向）。因此，辍学是一种正规化形式，有助于我们避免过度拟合

如果任何人有任何进一步的见解或更正，请张贴