Character encoding LSTM模型没有';我似乎没有从训练中学习
我的任务是:编写一个LSTM接受程序,给定一个字符序列,它将确定序列是正的([1-9]+a+[1-9]+b+[1-9]+[1-9]+c+[1-9]+[1-9]+d+[1-9])还是负的(相同,但b和c交换)。我不熟悉LSTM及其工作原理,因此我以PyTorch的本教程模型为基础: 我为每个序列中的每个字符编制索引,并从索引序列中生成张量。然后我将其作为输入发送到下面的模型:Character encoding LSTM模型没有';我似乎没有从训练中学习,character-encoding,pytorch,lstm,embedding,Character Encoding,Pytorch,Lstm,Embedding,我的任务是:编写一个LSTM接受程序,给定一个字符序列,它将确定序列是正的([1-9]+a+[1-9]+b+[1-9]+[1-9]+c+[1-9]+[1-9]+d+[1-9])还是负的(相同,但b和c交换)。我不熟悉LSTM及其工作原理,因此我以PyTorch的本教程模型为基础: 我为每个序列中的每个字符编制索引,并从索引序列中生成张量。然后我将其作为输入发送到下面的模型: class RNN(nn.Module): def __init__(self, vocab_size, inpu
class RNN(nn.Module):
def __init__(self, vocab_size, input_size, hidden_size, num_layers, num_classes):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.E = nn.Embedding(vocab_size, input_size)
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc1 = nn.Linear(hidden_size, num_classes)
def forward(self, x):
x = self.E(x)
x = x.view(batch_size, sequence_length, -1)
# Forward propagate LSTM
out, _ = self.lstm(x) # out: tensor of shape (batch_size, seq_length, hidden_size)
# Decode the hidden state of the last time step
out = torch.sigmoid(self.fc1(out[:, -1, :]))
return F.softmax(out, dim=1)
但是,当我以这种方式训练模型时:
model = RNN(len(indexed_vocab), input_size, hidden_size, num_layers, num_classes).to(device)
for param in model.parameters():
param.requires_grad = True
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
for i, (chars, labels) in enumerate(train_loader, 0):
chars = chars.reshape(-1, sequence_length, input_size).to(device)
labels = labels.to(device)
model.train()
# Forward pass
outputs = model(chars)
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (i + 1) % 100 == 0:
print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch + 1, num_epochs, i + 1, total_step, loss.item()))
列车损失没有减少,我得到了50%的验证精度。我将非常感谢您的帮助,即使只是指出我实施的错误。2猜测:将您的零度放在内部(I,(字符,标签))for循环的开始处,并在softmax之前移除sigmoid,您已经连续有两个激活函数,可以影响您的梯度(但不会导致它停滞…)。