Neural network 为什么在达到完美的训练配合后，训练准确性会下降？_Neural Network_Pytorch

Neural network 为什么在达到完美的训练配合后，训练准确性会下降？

neural-network pytorch

Neural network 为什么在达到完美的训练配合后，训练准确性会下降？,neural-network,pytorch,Neural Network,Pytorch,我正在pytorch培训一名NN学习MNIST数据。模型启动良好，改进，达到训练和测试数据的良好精度，稳定一段时间，然后测试和训练精度崩溃，如下图所示对于MNIST，我使用60000个训练图像，10000个测试，训练批量为100，学习率为0.01。神经网络由两个完全连接的隐藏层组成，每个隐藏层有100个节点，节点具有ReLU激活功能。F.交叉熵用于损失，SGD用于梯度计算这不是过度拟合的问题，因为训练和测试精度都会下降。我怀疑这与学习率太高有关。在基本情况下，我使用了0.01，但当我将其降低

我正在pytorch培训一名NN学习MNIST数据。模型启动良好，改进，达到训练和测试数据的良好精度，稳定一段时间，然后测试和训练精度崩溃，如下图所示

对于MNIST，我使用60000个训练图像，10000个测试，训练批量为100，学习率为0.01。神经网络由两个完全连接的隐藏层组成，每个隐藏层有100个节点，节点具有ReLU激活功能。F.交叉熵用于损失，SGD用于梯度计算

这不是过度拟合的问题，因为训练和测试精度都会下降。我怀疑这与学习率太高有关。在基本情况下，我使用了0.01，但当我将其降低到0.001时，整个模式会在稍后重复，如图所示（请注意x轴比例的变化，模式大约在10次之后发生，这是直观的）。使用更低的学习率也获得了类似的结果

我尝试过单元测试，检查各个部件并缩小模型。是当我在训练集中仅使用6个数据点时的结果，批量大小为2。训练数据的完美拟合（这里明显不同于预期的测试精度）并不令人惊讶，但它仍然从100%下降到1/6，因此并不比随机选择好多少。有谁能告诉我，要让网络从训练场的完美结合中脱颖而出，需要做些什么

以下是网络的结构（之前添加了相关的库），尽管我希望上述症状足以让您认识到没有它的问题：

class Network(nn.Module):
def __init__(self):
    # call to the super class Module from nn
    super(Network, self).__init__()

    # fc strand for 'fully connected'
    self.fc1 = nn.Linear(in_features=28*28, out_features=100)
    self.fc2 = nn.Linear(in_features=100, out_features=100)
    self.out = nn.Linear(in_features=100, out_features=10)

def forward(self, t):

    # (1) input layer (redundant)
    t = t

    # (2) hidden linear layer
    # As my t consists of 28*28 bit pictures, I need to flatten them:
    t = t.reshape(-1, 28*28)
    # Now having this reshaped input, add it to the linear layer
    t = self.fc1(t)
    # Again, apply ReLU as the activation function
    t = F.relu(t)

    # (3) hidden linear layer
    # As above, but reshaping is not needed now
    t = self.fc2(t)
    t = F.relu(t)

    # (4) output layer
    t = self.out(t)
    t = F.softmax(t, dim=1)

    return t

代码的主要执行：

for b in range(epochs):
print('***** EPOCH NO. ', b+1)
# getting a batch iterator
batch_iterator = iter(batch_train_loader)
# For loop for a single epoch, based on the length of the training set and the batch size
for a in range(round(train_size/b_size)):
    print(a+1)
    # get one batch for the iteration
    batch = next(batch_iterator)
    # decomposing a batch
    images, labels = batch[0].to(device), batch[1].to(device)
    # to get a prediction, as with individual layers, we need to equate it to the network with the samples as input:
    preds = network(images)
    # with the predictions, we will use F to get the loss as cross_entropy
    loss = F.cross_entropy(preds, labels)
    # function for counting the number of correct predictions
    get_num_correct(preds, labels))
    # calculate the gradients needed for update of weights
    loss.backward()
    # with the known gradients, we will update the weights according to stochastic gradient descent
    optimizer = optim.SGD(network.parameters(), lr=learning_rate)
    # with the known weights, step in the direction of correct estimation
    optimizer.step()
    # check if the whole data check should be performed (for taking full training/test data checks only in evenly spaced intervals on the log scale, pre-calculated later)
    if counter in X_log:
        # get the result on the whole train data and record them
        full_train_preds = network(full_train_images)
        full_train_loss = F.cross_entropy(full_train_preds, full_train_labels)
        # Record train loss
        a_train_loss.append(full_train_loss.item())
        # Get a proportion of correct estimates, to make them comparable between train and test data
        full_train_num_correct = get_num_correct(full_train_preds, full_train_labels)/train_size
        # Record train accuracy
        a_train_num_correct.append(full_train_num_correct)
        print('Correct predictions of the dataset:', full_train_num_correct)
        # Repeat for test predictions
        # get the results for the whole test data
        full_test_preds = network(full_test_images)
        full_test_loss = F.cross_entropy(full_test_preds, full_test_labels)
        a_test_loss.append(full_test_loss.item())
        full_test_num_correct = get_num_correct(full_test_preds, full_test_labels)/test_size
        a_test_num_correct.append(full_test_num_correct)
    # update counter
    counter = counter + 1

我在这里搜索并查看了这些问题的答案，但人们要么询问过度匹配，要么他们的NNs根本不会提高训练集的准确性（即，他们根本不工作），而不是寻找一个良好的训练匹配，然后完全失去它，也就是在训练集上。我希望我没有发布一些明显的东西，我对NN比较陌生，但我在发布之前已经尽了最大的努力研究了这个主题，谢谢你的帮助和理解

因此，我的看法是，您使用了太多的纪元，并且对模型进行了过度训练（不是过度拟合）。在不断刷新偏差/权重的某个点之后，它们不再能够区分值和噪波

我建议你检查一下，看看它是否与你看到的一致，因为这是我想到的第一件事

也可以看看这篇文章。（并不是说这是重复的）

本出版物：反向传播中的过度训练神经网络：一种彩色显像管校准示例

原因是代码中的错误。我们需要在训练循环的开始添加

optimizator.zero_grad（）

，并在外部训练循环之前创建optimizator，即

optimizator=optim.SGD（…）
对于范围内的b（历元）：

解释原因。

我使用了10个我不会归类为“太多”的纪元，但我可能弄错了——在整个训练集中使用10次是否太多？我试图复制BaityJesi 2019（）的结果，在那里他们将进行10^6个步骤，意味着数千个时代，他们得到了非常稳定的解决方案（至少在训练集上，而不是测试集上），所以我怀疑时代的数量是否是一个问题。我已经阅读了您提供的链接，感谢您分享它们。第一个，如果我理解正确的话，是关于训练数据的过度拟合（或过度训练，作者似乎给出了非常相似的含义），但仍然是关于它对新预测的不利程度：“这种对训练数据集的过度拟合将导致泛化错误的增加，使得模型在预测新数据时不太有用。”因此，我在这里没有看到与我的问题相关的信息——他没有在训练数据集上写错误增加的事，只是在测试（或验证）上数据集是一个典型的过度拟合问题-讨论了提前停止有一种解决这个过度拟合问题的方法，但我没有。第二篇文章清楚地表明，过度训练或过度拟合等词存在语义问题，人们使用它们非常自由，但正如最佳答案所说“据我所知，过度训练和过度拟合的模型之间没有区别”。事实上，这两个词都用来描述同一个问题：你在训练数据上过度训练了你的模型，这会导致模型过度拟合，也就是说，在测试（新/验证）时有很高的错误数据，与训练数据的高误差无关第三个环节，阿尔曼和宁方2001年的论文再次谈到过拟合，尽管他们称之为过度训练。通过检查图1可以清楚地看出，测试误差在某个点上增加，但训练误差持续下降，而在我的问题中，它随着测试误差的增加而增加。谢谢你感谢您尝试回答我的问题，但如上所述，所有三通都涉及过度拟合或换句话说，过度训练，这不是问题所在，10个时代似乎并不过度，因为Baty Jesi 2019获得了比这多得多的稳定结果。