Python Pytork训练损失不变,具有不同的前向传递实现

Python Pytork训练损失不变,具有不同的前向传递实现,python,deep-learning,pytorch,mnist,Python,Deep Learning,Pytorch,Mnist,以下代码(PyTorch中的MNIST MLP)提供了大致相同的训练损失,而不考虑前传球中的最后一层: F.log_softmax(x) (十) 选项1不正确,因为我使用了criteria=nn.CrossEntropyLoss(),但结果几乎相同。我遗漏了什么吗 import torch import numpy as np import time from torchvision import datasets import torchvision.transforms as transfo

以下代码(PyTorch中的MNIST MLP)提供了大致相同的训练损失,而不考虑前传球中的最后一层:

  • F.log_softmax(x)
  • (十)
  • 选项1不正确,因为我使用了
    criteria=nn.CrossEntropyLoss()
    ,但结果几乎相同。我遗漏了什么吗

    import torch
    import numpy as np
    import time
    from torchvision import datasets
    import torchvision.transforms as transforms
    # number of subprocesses to use for data loading
    num_workers = 0
    # how many samples per batch to load
    batch_size = 20
    
    # convert data to torch.FloatTensor
    transform = transforms.ToTensor()
    
    # choose the training and test datasets
    train_data = datasets.MNIST(root='data', train=True,
                                       download=True, transform=transform)
    test_data = datasets.MNIST(root='data', train=False,
                                      download=True, transform=transform)
    
    # prepare data loaders
    train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
        num_workers=num_workers)
    test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
        num_workers=num_workers)
    
    import torch.nn as nn
    import torch.nn.functional as F
    
    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            # linear layer (784 -> 1 hidden node)
            self.fc1 = nn.Linear(28 * 28, 512)
            self.dropout1= nn.Dropout(p=0.2, inplace= False)
            self.fc2 = nn.Linear(512, 256)
            self.dropout2= nn.Dropout(p=0.2, inplace= False)
            self.dropout = nn.Dropout(p=0.2, inplace= False)
            self.fc3 = nn.Linear(256, 10)
    
    
        def forward(self, x):
            # flatten image input
            x = x.view(-1, 28 * 28)
            # add hidden layer, with relu activation function
            x = F.relu(self.fc1(x))
            x = self.dropout1(x)
            x = F.relu(self.fc2(x))
            x = self.dropout2(x)
            x = self.fc3(x)
    #        return F.log_softmax(x)
            return x
    
    # initialize the NN
    model = Net()
    print(model)
    model.to('cuda')
    criterion = nn.CrossEntropyLoss()
    
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
    n_epochs = 10
    
    model.train() # prep model for training
    
    for epoch in range(n_epochs):
        # monitor training loss
        train_loss = 0.0
    
        start = time.time()
        for data, target in train_loader:
            data, target = data.to('cuda'), target.to('cuda')
    
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            train_loss += loss.item()*data.size(0)
    
        train_loss = train_loss/len(train_loader.dataset)
    
        print('Epoch: {} \tTraining Loss: {:.6f} \ttime: {:.6f}'.format(
            epoch+1,
            train_loss,
            time.time()-start
            ))
    

    为了数值稳定性,使用
    softmax
    层实现
    nn.CrossEntropyLoss()
    。因此,在向前传球时不应使用softmax层

    从文档()中:

    此条件将nn.LogSoftmax()和nn.NLLLoss()组合在一个类中

    在前向传递中使用softmax层将导致更差的度量,因为梯度大小降低(因此,权重更新也降低)。我是通过艰苦的方式学会的

    我想你的问题是,在训练开始时,损失是相似的,但在训练结束时,他们不应该这样做。在一批数据中过度拟合模型通常是一种良好的健全性检查。如果批量足够小,模型应达到100%的精度。如果模型训练时间太长,那么你可能在某个地方出现了错误


    希望对你有所帮助。=)

    谢谢。在这种情况下,即使在50个时代之后,“错误”代码也能提供与正确代码几乎相同的准确性。您是否试图在一批数据中过度拟合您的模型?如果它没有过度拟合,那是因为代码中的某个地方有一个bug破坏了梯度流。或者可能只是模型对于任务的表达不够(过于简单)。学习率太高或太低也可能会影响你的结果50个时代,没有辍学(所以我希望这会导致过度拟合),使用错误的代码,但结果绝对不错