Machine learning PyTorch中重复预测同三个单词的RNN语言模型_Machine Learning_Nlp_Pytorch_Recurrent Neural Network_Language Model

Machine learning PyTorch中重复预测同三个单词的RNN语言模型

machine-learning nlp pytorch

Machine learning PyTorch中重复预测同三个单词的RNN语言模型,machine-learning,nlp,pytorch,recurrent-neural-network,language-model,Machine Learning,Nlp,Pytorch,Recurrent Neural Network,Language Model,我正在尝试使用PyTorch中的RNN创建一个单词级语言模型。每当我在训练时，整个训练集的损失都是一样的，当我试着对一个新句子取样时，同样的三个单词以同样的顺序被预测。例如，在我最近的一次尝试中，RNN预测了“the”然后是“same”然后是“of”，并且该序列一直在重复。我尝试过改变我设置RNN的方式，包括使用LSTM、GRU和不同的嵌入，但到目前为止没有任何效果我训练RNN的方法是，选择一个50个单词的句子，然后选择一个越来越大的部分，下一个单词作为目标。在句子的末尾，我有一个EOS标签。

我正在尝试使用PyTorch中的RNN创建一个单词级语言模型。每当我在训练时，整个训练集的损失都是一样的，当我试着对一个新句子取样时，同样的三个单词以同样的顺序被预测。例如，在我最近的一次尝试中，RNN预测了“the”然后是“same”然后是“of”，并且该序列一直在重复。我尝试过改变我设置RNN的方式，包括使用LSTM、GRU和不同的嵌入，但到目前为止没有任何效果

我训练RNN的方法是，选择一个50个单词的句子，然后选择一个越来越大的部分，下一个单词作为目标。在句子的末尾，我有一个EOS标签。我使用柏拉图的《共和国》中的文本作为我的训练集，并使用pytorch嵌入层将其嵌入。然后我把它送入一个LSTM，然后是一个线性层，以得到正确的形状。我不确定问题是否出在RNN、数据、培训或其他方面，因此非常感谢您的帮助

如果有人在nlp或语言建模方面有任何经验，我将非常感谢您为解决此问题提供的任何帮助。我的最终目标只是能够生成一个句子。提前谢谢

这是我的RNN

class LanguageModel(nn.Module):
  """
    Class that defines the reccurent neural network.

    Methods
    -------
    forward(input, h, c)
      Forward propogation through the RNN.
    initHidden()
      Initializes the hidden and cell states.
  """
  def __init__(self, vocabSize, seqLen = 51, embeddingDim = 30, hiddenSize = 32, numLayers = 1, bid = False):
    """
      Initializes the class

      Parameters
      ----------
      seqLen : int, optional
        The length of the input sequence.
      embeddingDim : int, optional
        The dimension that the embedding dimension for the encoder should be.
      vocabSize : int
        The length of the vocab dictionary.
      hiddenSize : int, optional
        The size that the hidden state should be.
      numLayers : int, optional
        The number of LSTM Layers.
      bid : bool, optional
        Whether the RNN should be bidirctional or not.
    """

    super(LanguageModel, self).__init__()
    self.hiddenSize = hiddenSize
    self.numLayers = numLayers

    # Set value of numDirections based on whether or not the RNN is bidirectional.
    if bid == True:
      self.numDirections = 2
    else:
      self.numDirections = 1

    self.encoder = nn.Embedding(vocabSize, embeddingDim)
    self.LSTM = nn.LSTM(input_size = embeddingDim, hidden_size = hiddenSize, num_layers = numLayers, bidirectional = bid)
    self.decoder = nn.Linear(seqLen * self.numDirections * hiddenSize, vocabSize)

  def forward(self, input, h, c):
    """
      Forward propogates through the RNN

      Parameters
      ----------
      input : torch.Tensor
        Input to RNN. Should be formatter using makeInput() and padSeq().
      h : torch.Tensor
        Hidden state.
      c : torch.Tensor
        Cell state.

      Returns
      -------
      torch.Tensor
        Log probabilities for the predicted word from the RNN.
    """

    emb = self.encoder(input)
    emb.unsqueeze_(1) # Add in the batch dimension so the shape is right for the LSTM

    out, (h, c) = self.LSTM(emb, (h, c))
    out = out.view(1, -1) # Reshaping to fit into the loss function.

    out = self.decoder(out)

    logProbs = F.log_softmax(out, dim = 1)

    return logProbs

  def initHidden(self):
    """
      Initializes the hidden and cell states.

      Returns
      -------
      torch.Tensor
        Tensor containing the initial hidden state.
      torch.Tensor
        Tensor containing the intial cell state.
    """
    h = torch.zeros(self.numLayers * self.numDirections, 1, self.hiddenSize)
    c = torch.zeros(self.numLayers * self.numDirections, 1, self.hiddenSize)
    
    return h, c

下面是我如何创建输入和目标的

def makeInput(sentence):
  """
    Prepares a sentence for input to the RNN.

    Parameters
    ----------
    sentence : list
      The sentence to be converted into input. Should be of form: [str] 

    Returns
    -------
    torch.Tensor
      Tensor of the indices for each word in the input sentence.
  """

  sen = sentence[0].split() # Split the list into individual words
  sen.insert(0, 'START')

  input = [word2Idx[word] for word in sen] # Iterate over the words in sentence and convert to indices

  return torch.tensor(input)

def makeTarget(sentence):
  """
    Prepares a sentence to be a target.

    Parameters
    ----------
    sentence : str
      The sentence to be made into a target. Should be of form: [str]

    Returns
    -------
    torch.Tensor
      Tensor of the indices for the target phrase including the <EOS> tag.
  """

  sen = sentence[0].split() # Split the list into individual words
  sen.append('EOS')
  
  target = [word2Idx[word] for word in sen]
  target = torch.tensor(target, dtype = torch.long)

  return target.unsqueeze_(-1) # Removing dimension for loss function 

def padSeq(seq, refSeq):
  """
    Pads a sequence to be the same shape as another sequence.

    Parameters
    ----------
    seq : torch.Tensor
      The sequence to pad.
    refSeq : torch.Tensor
      The reference sequence. seq will be padded to be the same shape as refSeq.

    Returns
    -------
    torch.Tensor
      Tensor containing the padded sequence.
  """

  padded = pad_sequence([refSeq, seq])
  tmp = torch.t(padded) # Transpose the padded sequence for easier indexing on return

  return tmp[1] # Return only the padded seq not both sequences

这是我在这个网站上的第一篇帖子，所以我不太确定我是否包含了解决这个问题所需的所有内容。如果您需要更多信息，请告诉我，谢谢！这是我在这个网站上的第一篇帖子，所以我不太确定我是否包含了解决这个问题所需的所有内容。如果您需要更多信息，请告诉我，谢谢！

def train():
  """
    Trains the model.
  """

  start = time.time()
  for i, data in enumerate(trainLoader):
    inputTensor = makeInput(data)
    targetTensor = makeTarget(data)

    targetTensor = targetTensor.to(device)

    h, c = model.initHidden()
    h = h.to(device)
    c = c.to(device)
    
    optimizer.zero_grad()
    loss = 0

    for x in range(inputTensor.size(0)): # Iterate over all of the words in the input sentence
      """ Preparing input for the rnn """
      input = inputTensor[: x + 1] # We only want part of the input so the RNN can learn on predicting the next words
      input = padSeq(input, inputTensor)
      input = input.to(device)

      out = model(input, h, c)
      l = criterion(out, targetTensor[x])
      loss += l

    loss.backward()
    optimizer.step()
  
    if i % 250 == 0: # Print updates to the models loss every 10 iters.
      print('[{}] Epoch: {} -> {}'.format(timeSince(start), i, loss / inputTensor.size(0)))