Python 使用LSTM ptb模型tensorflow示例预测下一个单词_Python_Tensorflow_Lstm

Python 使用LSTM ptb模型tensorflow示例预测下一个单词

python tensorflow

Python 使用LSTM ptb模型tensorflow示例预测下一个单词,python,tensorflow,lstm,Python,Tensorflow,Lstm,我试着用tensorflow来预测下一个单词如本文所述（没有可接受的答案），该示例包含用于提取下一个单词概率的伪代码： lstm = rnn_cell.BasicLSTMCell(lstm_size) # Initial state of the LSTM memory. state = tf.zeros([batch_size, lstm.state_size]) loss = 0.0 for current_batch_of_words in words_in_dataset: #

我试着用tensorflow来预测下一个单词

如本文所述（没有可接受的答案），该示例包含用于提取下一个单词概率的伪代码：

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])

loss = 0.0
for current_batch_of_words in words_in_dataset:
  # The value of state is updated after processing each batch of words.
  output, state = lstm(current_batch_of_words, state)

  # The LSTM output can be used to make next word predictions
  logits = tf.matmul(output, softmax_w) + softmax_b
  probabilities = tf.nn.softmax(logits)
  loss += loss_function(probabilities, target_words)

我对如何解释概率向量感到困惑。我在中修改了

PTBModel

的

\uuuuu init\uuuu

函数，以存储概率和逻辑：

class PTBModel(object):
  """The PTB model."""

  def __init__(self, is_training, config):
    # General definition of LSTM (unrolled)
    # identical to tensorflow example ...     
    # omitted for brevity ...


    # computing the logits (also from example code)
    logits = tf.nn.xw_plus_b(output,
                             tf.get_variable("softmax_w", [size, vocab_size]),
                             tf.get_variable("softmax_b", [vocab_size]))
    loss = seq2seq.sequence_loss_by_example([logits],
                                            [tf.reshape(self._targets, [-1])],
                                            [tf.ones([batch_size * num_steps])],
                                            vocab_size)
    self._cost = cost = tf.reduce_sum(loss) / batch_size
    self._final_state = states[-1]

    # my addition: storing the probabilities and logits
    self.probabilities = tf.nn.softmax(logits)
    self.logits = logits

    # more model definition ...

然后在

run\u epoch

功能中打印一些关于它们的信息：

def run_epoch(session, m, data, eval_op, verbose=True):
  """Runs the model on the given data."""
  # first part of function unchanged from example

  for step, (x, y) in enumerate(reader.ptb_iterator(data, m.batch_size,
                                                    m.num_steps)):
    # evaluate proobability and logit tensors too:
    cost, state, probs, logits, _ = session.run([m.cost, m.final_state, m.probabilities, m.logits, eval_op],
                                 {m.input_data: x,
                                  m.targets: y,
                                  m.initial_state: state})
    costs += cost
    iters += m.num_steps

    if verbose and step % (epoch_size // 10) == 10:
      print("%.3f perplexity: %.3f speed: %.0f wps, n_iters: %s" %
            (step * 1.0 / epoch_size, np.exp(costs / iters),
             iters * m.batch_size / (time.time() - start_time), iters))
      chosen_word = np.argmax(probs, 1)
      print("Probabilities shape: %s, Logits shape: %s" % 
            (probs.shape, logits.shape) )
      print(chosen_word)
      print("Batch size: %s, Num steps: %s" % (m.batch_size, m.num_steps))

  return np.exp(costs / iters)

这将产生如下输出：

0.000 perplexity: 741.577 speed: 230 wps, n_iters: 220
(20, 10000) (20, 10000)
[ 14   1   6 589   1   5   0  87   6   5   3   5   2   2   2   2   6   2  6   1]
Batch size: 1, Num steps: 20

我希望

probs

向量是一个概率数组，词汇表中的每个单词都有一个概率（例如，形状

（1，vocab_size）

），这意味着我可以使用

np.argmax（probs，1）

得到预测的单词，正如在另一个问题中所建议的那样

然而，向量的第一维实际上等于展开的LSTM中的步数（如果使用小配置设置，则为20步），我不确定该如何处理。要访问预测词，我是否只需要使用最后一个值（因为它是最后一步的输出）？还是我还缺少什么

我试图通过查看的实现来理解预测是如何做出和评估的，它必须执行此评估，但最终调用了

gen\u nn\u ops.\u sparse\u softmax\u cross\u entropy\u with\u logits

，这似乎不包括在github repo中，因此我不确定还有什么地方可以查看

我对tensorflow和LSTMs都是新手，因此非常感谢您的帮助

输出张量包含每个时间步的LSTM单元输出的浓缩（参见其定义）。因此，您可以通过选择

所选单词[-1]

（或

所选单词[sequence\u length-1]

（如果序列已填充以匹配展开的LSTM），来找到下一个单词的预测

op以不同的名称记录在公共API中。出于技术原因，它调用了一个生成的包装器函数，该函数没有出现在GitHub存储库中。OP的实现是在C++中，

< P>我也在执行SEQ2SEQ模型。让我试着用我的理解来解释：

LSTM模型的输出是一个大小为[批量大小，大小]的2D张量列表（长度为步数）

代码行：

output=tf.reformate（tf.concat（1，输出），[-1，大小]）

将产生一个新的输出，它是一个大小为[批量大小xnum\u步数，大小]的二维张量

对于您的情况，batch_size=1和num_steps=20-->输出形状为[20，size]

代码行：

logits=tf.nn.xw\u plus\u b（输出，tf.get\u变量（“softmax\u w，[size，vocab\u size]），tf.get\u变量（“softmax\u，[vocab\u size]））

输出[batch\u size x num\u steps，size]xsoftmax\u w[size，vocab\u size]将输出大小为[batch\u sizexnum\u steps，vocab\u size的日志。
对于您的案例，logits的大小[20，vocab\u大小] -->probs张量的大小与logits的大小相同[20，vocab_size]
代码行：

selected\u word=np.argmax（probs，1）
将输出所选单词的大小张量[20，1]，每个值是当前单词的下一个预测单词索引
代码行：

loss=seq2seq.sequence\u loss\u示例（[logits]、[tf.restrape（self.\u targets，[-1]）、[tf.ones（[batch\u size*num\u steps]））

是计算序列批量大小的softmax交叉熵损失。
为什么选择单词[-1]？所选单词的大小为batch\u size*num\u steps。模型是否为每个步骤预测一个单词？似乎是这样。它似乎在预测下一个数字步。是吗？@Eugrandory:答案假设你只对要预测的最后一个单词感兴趣，而不是中间的单词。我仍在寻找如何使用LSTM预测单词的答案。我也在寻找工作代码来做到这一点。