Machine learning PyTorch中重复预测同三个单词的RNN语言模型
我正在尝试使用PyTorch中的RNN创建一个单词级语言模型。每当我在训练时,整个训练集的损失都是一样的,当我试着对一个新句子取样时,同样的三个单词以同样的顺序被预测。例如,在我最近的一次尝试中,RNN预测了“the”然后是“same”然后是“of”,并且该序列一直在重复。我尝试过改变我设置RNN的方式,包括使用LSTM、GRU和不同的嵌入,但到目前为止没有任何效果 我训练RNN的方法是,选择一个50个单词的句子,然后选择一个越来越大的部分,下一个单词作为目标。在句子的末尾,我有一个EOS标签。我使用柏拉图的《共和国》中的文本作为我的训练集,并使用pytorch嵌入层将其嵌入。然后我把它送入一个LSTM,然后是一个线性层,以得到正确的形状。我不确定问题是否出在RNN、数据、培训或其他方面,因此非常感谢您的帮助 如果有人在nlp或语言建模方面有任何经验,我将非常感谢您为解决此问题提供的任何帮助。我的最终目标只是能够生成一个句子。提前谢谢 这是我的RNNMachine learning PyTorch中重复预测同三个单词的RNN语言模型,machine-learning,nlp,pytorch,recurrent-neural-network,language-model,Machine Learning,Nlp,Pytorch,Recurrent Neural Network,Language Model,我正在尝试使用PyTorch中的RNN创建一个单词级语言模型。每当我在训练时,整个训练集的损失都是一样的,当我试着对一个新句子取样时,同样的三个单词以同样的顺序被预测。例如,在我最近的一次尝试中,RNN预测了“the”然后是“same”然后是“of”,并且该序列一直在重复。我尝试过改变我设置RNN的方式,包括使用LSTM、GRU和不同的嵌入,但到目前为止没有任何效果 我训练RNN的方法是,选择一个50个单词的句子,然后选择一个越来越大的部分,下一个单词作为目标。在句子的末尾,我有一个EOS标签。
class LanguageModel(nn.Module):
"""
Class that defines the reccurent neural network.
Methods
-------
forward(input, h, c)
Forward propogation through the RNN.
initHidden()
Initializes the hidden and cell states.
"""
def __init__(self, vocabSize, seqLen = 51, embeddingDim = 30, hiddenSize = 32, numLayers = 1, bid = False):
"""
Initializes the class
Parameters
----------
seqLen : int, optional
The length of the input sequence.
embeddingDim : int, optional
The dimension that the embedding dimension for the encoder should be.
vocabSize : int
The length of the vocab dictionary.
hiddenSize : int, optional
The size that the hidden state should be.
numLayers : int, optional
The number of LSTM Layers.
bid : bool, optional
Whether the RNN should be bidirctional or not.
"""
super(LanguageModel, self).__init__()
self.hiddenSize = hiddenSize
self.numLayers = numLayers
# Set value of numDirections based on whether or not the RNN is bidirectional.
if bid == True:
self.numDirections = 2
else:
self.numDirections = 1
self.encoder = nn.Embedding(vocabSize, embeddingDim)
self.LSTM = nn.LSTM(input_size = embeddingDim, hidden_size = hiddenSize, num_layers = numLayers, bidirectional = bid)
self.decoder = nn.Linear(seqLen * self.numDirections * hiddenSize, vocabSize)
def forward(self, input, h, c):
"""
Forward propogates through the RNN
Parameters
----------
input : torch.Tensor
Input to RNN. Should be formatter using makeInput() and padSeq().
h : torch.Tensor
Hidden state.
c : torch.Tensor
Cell state.
Returns
-------
torch.Tensor
Log probabilities for the predicted word from the RNN.
"""
emb = self.encoder(input)
emb.unsqueeze_(1) # Add in the batch dimension so the shape is right for the LSTM
out, (h, c) = self.LSTM(emb, (h, c))
out = out.view(1, -1) # Reshaping to fit into the loss function.
out = self.decoder(out)
logProbs = F.log_softmax(out, dim = 1)
return logProbs
def initHidden(self):
"""
Initializes the hidden and cell states.
Returns
-------
torch.Tensor
Tensor containing the initial hidden state.
torch.Tensor
Tensor containing the intial cell state.
"""
h = torch.zeros(self.numLayers * self.numDirections, 1, self.hiddenSize)
c = torch.zeros(self.numLayers * self.numDirections, 1, self.hiddenSize)
return h, c
下面是我如何创建输入和目标的
def makeInput(sentence):
"""
Prepares a sentence for input to the RNN.
Parameters
----------
sentence : list
The sentence to be converted into input. Should be of form: [str]
Returns
-------
torch.Tensor
Tensor of the indices for each word in the input sentence.
"""
sen = sentence[0].split() # Split the list into individual words
sen.insert(0, 'START')
input = [word2Idx[word] for word in sen] # Iterate over the words in sentence and convert to indices
return torch.tensor(input)
def makeTarget(sentence):
"""
Prepares a sentence to be a target.
Parameters
----------
sentence : str
The sentence to be made into a target. Should be of form: [str]
Returns
-------
torch.Tensor
Tensor of the indices for the target phrase including the <EOS> tag.
"""
sen = sentence[0].split() # Split the list into individual words
sen.append('EOS')
target = [word2Idx[word] for word in sen]
target = torch.tensor(target, dtype = torch.long)
return target.unsqueeze_(-1) # Removing dimension for loss function
def padSeq(seq, refSeq):
"""
Pads a sequence to be the same shape as another sequence.
Parameters
----------
seq : torch.Tensor
The sequence to pad.
refSeq : torch.Tensor
The reference sequence. seq will be padded to be the same shape as refSeq.
Returns
-------
torch.Tensor
Tensor containing the padded sequence.
"""
padded = pad_sequence([refSeq, seq])
tmp = torch.t(padded) # Transpose the padded sequence for easier indexing on return
return tmp[1] # Return only the padded seq not both sequences
这是我在这个网站上的第一篇帖子,所以我不太确定我是否包含了解决这个问题所需的所有内容。如果您需要更多信息,请告诉我,谢谢!这是我在这个网站上的第一篇帖子,所以我不太确定我是否包含了解决这个问题所需的所有内容。如果您需要更多信息,请告诉我,谢谢!
def train():
"""
Trains the model.
"""
start = time.time()
for i, data in enumerate(trainLoader):
inputTensor = makeInput(data)
targetTensor = makeTarget(data)
targetTensor = targetTensor.to(device)
h, c = model.initHidden()
h = h.to(device)
c = c.to(device)
optimizer.zero_grad()
loss = 0
for x in range(inputTensor.size(0)): # Iterate over all of the words in the input sentence
""" Preparing input for the rnn """
input = inputTensor[: x + 1] # We only want part of the input so the RNN can learn on predicting the next words
input = padSeq(input, inputTensor)
input = input.to(device)
out = model(input, h, c)
l = criterion(out, targetTensor[x])
loss += l
loss.backward()
optimizer.step()
if i % 250 == 0: # Print updates to the models loss every 10 iters.
print('[{}] Epoch: {} -> {}'.format(timeSince(start), i, loss / inputTensor.size(0)))