Python Pytork训练损失不变,具有不同的前向传递实现
以下代码(PyTorch中的MNIST MLP)提供了大致相同的训练损失,而不考虑前传球中的最后一层:Python Pytork训练损失不变,具有不同的前向传递实现,python,deep-learning,pytorch,mnist,Python,Deep Learning,Pytorch,Mnist,以下代码(PyTorch中的MNIST MLP)提供了大致相同的训练损失,而不考虑前传球中的最后一层: F.log_softmax(x) (十) 选项1不正确,因为我使用了criteria=nn.CrossEntropyLoss(),但结果几乎相同。我遗漏了什么吗 import torch import numpy as np import time from torchvision import datasets import torchvision.transforms as transfo
criteria=nn.CrossEntropyLoss()
,但结果几乎相同。我遗漏了什么吗
import torch
import numpy as np
import time
from torchvision import datasets
import torchvision.transforms as transforms
# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20
# convert data to torch.FloatTensor
transform = transforms.ToTensor()
# choose the training and test datasets
train_data = datasets.MNIST(root='data', train=True,
download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
download=True, transform=transform)
# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
num_workers=num_workers)
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# linear layer (784 -> 1 hidden node)
self.fc1 = nn.Linear(28 * 28, 512)
self.dropout1= nn.Dropout(p=0.2, inplace= False)
self.fc2 = nn.Linear(512, 256)
self.dropout2= nn.Dropout(p=0.2, inplace= False)
self.dropout = nn.Dropout(p=0.2, inplace= False)
self.fc3 = nn.Linear(256, 10)
def forward(self, x):
# flatten image input
x = x.view(-1, 28 * 28)
# add hidden layer, with relu activation function
x = F.relu(self.fc1(x))
x = self.dropout1(x)
x = F.relu(self.fc2(x))
x = self.dropout2(x)
x = self.fc3(x)
# return F.log_softmax(x)
return x
# initialize the NN
model = Net()
print(model)
model.to('cuda')
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
n_epochs = 10
model.train() # prep model for training
for epoch in range(n_epochs):
# monitor training loss
train_loss = 0.0
start = time.time()
for data, target in train_loader:
data, target = data.to('cuda'), target.to('cuda')
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
train_loss += loss.item()*data.size(0)
train_loss = train_loss/len(train_loader.dataset)
print('Epoch: {} \tTraining Loss: {:.6f} \ttime: {:.6f}'.format(
epoch+1,
train_loss,
time.time()-start
))
为了数值稳定性,使用
softmax
层实现nn.CrossEntropyLoss()
。因此,在向前传球时不应使用softmax层
从文档()中:
此条件将nn.LogSoftmax()和nn.NLLLoss()组合在一个类中
在前向传递中使用softmax层将导致更差的度量,因为梯度大小降低(因此,权重更新也降低)。我是通过艰苦的方式学会的
我想你的问题是,在训练开始时,损失是相似的,但在训练结束时,他们不应该这样做。在一批数据中过度拟合模型通常是一种良好的健全性检查。如果批量足够小,模型应达到100%的精度。如果模型训练时间太长,那么你可能在某个地方出现了错误
希望对你有所帮助。=)谢谢。在这种情况下,即使在50个时代之后,“错误”代码也能提供与正确代码几乎相同的准确性。您是否试图在一批数据中过度拟合您的模型?如果它没有过度拟合,那是因为代码中的某个地方有一个bug破坏了梯度流。或者可能只是模型对于任务的表达不够(过于简单)。学习率太高或太低也可能会影响你的结果50个时代,没有辍学(所以我希望这会导致过度拟合),使用错误的代码,但结果绝对不错