使用NCE或采样softmax训练TensorFlow语言模型_Tensorflow_Lstm_Softmax_Language Model

使用NCE或采样softmax训练TensorFlow语言模型

tensorflow

使用NCE或采样softmax训练TensorFlow语言模型,tensorflow,lstm,softmax,language-model,Tensorflow,Lstm,Softmax,Language Model,我正在修改TensorFlow RNN教程，以训练具有NCE损失或采样softmax的语言模型，但我仍然想报告一些困惑。然而，我得到的困惑是非常奇怪的：对于NCE，我得到了数百万（糟糕！），而对于采样的softmax，我在一个时代后得到了700 PPL（太好了，不可能是真的？！）。我不知道我做错了什么以下是我对PTBModel的改编： class PTBModel(object): """The PTB model.""" def __init__(self, is_training

我正在修改TensorFlow RNN教程，以训练具有NCE损失或采样softmax的语言模型，但我仍然想报告一些困惑。然而，我得到的困惑是非常奇怪的：对于NCE，我得到了数百万（糟糕！），而对于采样的softmax，我在一个时代后得到了700 PPL（太好了，不可能是真的？！）。我不知道我做错了什么

以下是我对PTBModel的改编：

class PTBModel(object):
  """The PTB model."""

  def __init__(self, is_training, config, loss_function="softmax"):
    ...
    w = tf.get_variable("proj_w", [size, vocab_size])
    w_t = tf.transpose(w)
    b = tf.get_variable("proj_b", [vocab_size])

    if loss_function == "softmax":
      logits = tf.matmul(output, w) + b
      loss = tf.nn.seq2seq.sequence_loss_by_example(
          [logits],
          [tf.reshape(self._targets, [-1])],
          [tf.ones([batch_size * num_steps])])
      self._cost = cost = tf.reduce_sum(loss) / batch_size
    elif loss_function == "nce":
      num_samples = 10
      labels = tf.reshape(self._targets, [-1,1])
      hidden = output
      loss = tf.nn.nce_loss(w_t, b,                           
                            hidden,
                            labels,
                            num_samples, 
                            vocab_size)
    elif loss_function == "sampled_softmax":
      num_samples = 10
      labels = tf.reshape(self._targets, [-1,1])
      hidden = output
      loss = tf.nn.sampled_softmax_loss(w_t, b,
                                        hidden, 
                                        labels, 
                                        num_samples,
                                        vocab_size)

    self._cost = cost = tf.reduce_sum(loss) / batch_size
    self._final_state = state

对此模型的调用如下所示：

mtrain = PTBModel(is_training=True, config=config, loss_function="nce")
mvalid = PTBModel(is_training=True, config=config)

我在这里没有做任何异国情调的事情，更改损失函数应该非常简单。那么为什么它不起作用呢

谢谢， Joris

使用基线模型（Softmax），在一个时代内，你应该会比700强很多。通过改变损失，你可能需要重新调整一些超参数——特别是学习率

另外，您的评估模型应该通过使用Softmax报告真正的困惑——您这样做了吗？

似乎采样的Softmax确实有效，它在13个时代（SmallConfig）后以129个负样本结束。另一方面，NCE仍然让我失望。困惑（如您所说，使用完整的softmax计算）大约有数百万。同意我需要重新调整，但即使不进行调整，我也希望困惑会有所下降，而不是从~10k增加到2M？！仅供参考：NCE实际上给出了合理的时间步长值。当你增加这个数字时，它开始变得疯狂了。@niefpaarschoenen嗨，我现在正在研究它。您是否发现使用NCE可以提高性能？特别是以每秒字数计算？谢谢