Character encoding LSTM模型没有';我似乎没有从训练中学习

Character encoding LSTM模型没有';我似乎没有从训练中学习,character-encoding,pytorch,lstm,embedding,Character Encoding,Pytorch,Lstm,Embedding,我的任务是:编写一个LSTM接受程序,给定一个字符序列,它将确定序列是正的([1-9]+a+[1-9]+b+[1-9]+[1-9]+c+[1-9]+[1-9]+d+[1-9])还是负的(相同,但b和c交换)。我不熟悉LSTM及其工作原理,因此我以PyTorch的本教程模型为基础: 我为每个序列中的每个字符编制索引,并从索引序列中生成张量。然后我将其作为输入发送到下面的模型: class RNN(nn.Module): def __init__(self, vocab_size, inpu

我的任务是:编写一个LSTM接受程序,给定一个字符序列,它将确定序列是正的([1-9]+a+[1-9]+b+[1-9]+[1-9]+c+[1-9]+[1-9]+d+[1-9])还是负的(相同,但b和c交换)。我不熟悉LSTM及其工作原理,因此我以PyTorch的本教程模型为基础:

我为每个序列中的每个字符编制索引,并从索引序列中生成张量。然后我将其作为输入发送到下面的模型:

class RNN(nn.Module):
    def __init__(self, vocab_size, input_size, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.E = nn.Embedding(vocab_size, input_size)
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc1 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):

        x = self.E(x)
        x = x.view(batch_size, sequence_length, -1)
        # Forward propagate LSTM
        out, _ = self.lstm(x)  # out: tensor of shape (batch_size, seq_length, hidden_size)

        # Decode the hidden state of the last time step
        out = torch.sigmoid(self.fc1(out[:, -1, :]))
        return F.softmax(out, dim=1)
但是,当我以这种方式训练模型时:

model = RNN(len(indexed_vocab), input_size, hidden_size, num_layers, num_classes).to(device)
for param in model.parameters():
    param.requires_grad = True

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (chars, labels) in enumerate(train_loader, 0):
        chars = chars.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)
        model.train()
        # Forward pass
        outputs = model(chars)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i + 1) % 100 == 0:
            print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
                  .format(epoch + 1, num_epochs, i + 1, total_step, loss.item()))

列车损失没有减少,我得到了50%的验证精度。我将非常感谢您的帮助,即使只是指出我实施的错误。

2猜测:将您的零度放在内部(I,(字符,标签))for循环的开始处,并在softmax之前移除sigmoid,您已经连续有两个激活函数,可以影响您的梯度(但不会导致它停滞…)。