Neural network MNIST上的二元分类:损失和准确度仍然是共同的

Neural network MNIST上的二元分类:损失和准确度仍然是共同的,neural-network,pytorch,conv-neural-network,mnist,image-classification,Neural Network,Pytorch,Conv Neural Network,Mnist,Image Classification,我正在尝试对MNIST数据集进行二进制分类。0类表示偶数,1类表示奇数。我使用的是VGG的简化版本。 我的神经网络有一个损失和准确性,仍然是共同的。 我想说的是,我的模型,在将目标转换为二进制目标之前,已经达到了90%以上的准确度,所以可能是出了问题。 在这里,我将目标更改为二进制: for i in range(10): idx = (train_set.targets==i) if (i == 0) or ((i % 2) == 0): train_set.targets[idx]

我正在尝试对MNIST数据集进行二进制分类。0类表示偶数,1类表示奇数。我使用的是VGG的简化版本。 我的神经网络有一个损失和准确性,仍然是共同的。 我想说的是,我的模型,在将目标转换为二进制目标之前,已经达到了90%以上的准确度,所以可能是出了问题。 在这里,我将目标更改为二进制:

for i in range(10):
  idx = (train_set.targets==i)
  if (i == 0) or ((i % 2) == 0): train_set.targets[idx] = 0

  else: train_set.targets[idx] = 1

for i in range(10):
  idx = (test_set.targets==i)
  if (i == 0) or ((i % 2) == 0): test_set.targets[idx] = 0

  else: test_set.targets[idx] = 1
这是我的网:

class VGG16(nn.Module):

    def __init__(self, num_classes):
        super(VGG16, self).__init__()

        # calculate same padding:
        # (w - k + 2*p)/s + 1 = o
        # => p = (s(o-1) - w + k)/2

        self.block_1 = nn.Sequential(
            nn.Conv2d(in_channels=1,
                      out_channels=64,
                      kernel_size=(3, 3),
                      stride=(1, 1),
                      # (1(32-1)- 32 + 3)/2 = 1
                      padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.Conv2d(in_channels=64,
                      out_channels=64,
                      kernel_size=(3, 3),
                      stride=(1, 1),
                      padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2, 2),
                         stride=(2, 2))
        )

        self.block_2 = nn.Sequential(
            nn.Conv2d(in_channels=64,
                      out_channels=128,
                      kernel_size=(3, 3),
                      stride=(1, 1),
                      padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Conv2d(in_channels=128,
                      out_channels=128,
                      kernel_size=(3, 3),
                      stride=(1, 1),
                      padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2, 2),
                         stride=(2, 2))
        )
        
        self.block_3 = nn.Sequential(
            nn.Conv2d(in_channels=128,
                      out_channels=256,
                      kernel_size=(3, 3),
                      stride=(1, 1),
                      padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(in_channels=256,
                      out_channels=256,
                      kernel_size=(3, 3),
                      stride=(1, 1),
                      padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(in_channels=256,
                      out_channels=256,
                      kernel_size=(3, 3),
                      stride=(1, 1),
                      padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2, 2),
                         stride=(2, 2))
        )

        self.block_4 = nn.Sequential(
            nn.Conv2d(in_channels=256,
                      out_channels=512,
                      kernel_size=(3, 3),
                      stride=(1, 1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,
                      out_channels=512,
                      kernel_size=(3, 3),
                      stride=(1, 1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,
                      out_channels=512,
                      kernel_size=(3, 3),
                      stride=(1, 1),
                      padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2, 2),
                         stride=(2, 2))
        )            

        self.classifier = nn.Sequential(
            nn.Linear(2048, 4096),
            nn.ReLU(True),
            nn.Dropout(p=0.65),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(p=0.65),
            nn.Linear(4096, num_classes),
            nn.Sigmoid() 
        )

        for m in self.modules():
            if isinstance(m, torch.nn.Conv2d) or isinstance(m, torch.nn.Linear):
                nn.init.kaiming_uniform_(m.weight, mode='fan_in', nonlinearity='leaky_relu')
#                 nn.init.xavier_normal_(m.weight)
                if m.bias is not None:
                    m.bias.detach().zero_()

        # self.avgpool = nn.AdaptiveAvgPool2d((7, 7))

    def forward(self, x):

        x = self.block_1(x)
        x = self.block_2(x)
        x = self.block_3(x)
        x = self.block_4(x)
        # x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x
        #logits = self.classifier(x)
        #probas = F.softmax(logits, dim=1)
        # probas = nn.Softmax(logits)
        #return probas
        # return logits
从以前的数字识别模型中,我只改变了目标,最后一层的分类器从10个类改为1个类+Sigmoid。我也把交叉熵改成了B熵。我做错了什么

这些是损失和精度值:

Epoch 1: TrL=49.0955, TrA=31.4211, VL=49.7285, VA=31.7340, TeL=49.2635, TeA=31.3758,
Epoch 2: TrL=49.0992, TrA=31.4235, VL=49.7285, VA=31.7340, TeL=49.2635, TeA=31.3758,
Epoch 3: TrL=49.0899, TrA=31.4176, VL=49.7285, VA=31.7340, TeL=49.2635, TeA=31.3758,
Epoch 4: TrL=49.0936, TrA=31.4199, VL=49.7285, VA=31.7340, TeL=49.2635, TeA=31.3758,
Epoch 5: TrL=49.0936, TrA=31.4199, VL=49.7285, VA=31.7340, TeL=49.2635, TeA=31.3758,
Epoch 6: TrL=49.0825, TrA=31.4128, VL=49.7285, VA=31.7340, TeL=49.2635, TeA=31.3758,
怎么了?10门课的准确率超过90%,而简化版只有2门课的准确率达到30%,这怎么可能呢


编辑:将批量大小从64增加到128,精确度达到60%并保持不变…

在我看来,问题在于奇数和偶数的不同表示形式。让我们拍1张,3张这个数字的照片是各式各样的,卷积神经网络在提取特征方面有问题。神经网络有90%的准确度,有10个类,所以为什么需要将其转换为2个类。如果你知道这个数字是1,3,5,7,9,你就知道它是奇数。

因为我正在做一个迁移学习的项目,为了简单起见,我想让2个神经网络进行二进制分类。这两个我都想要至少70%的准确率。MNIST是60000个样本,在这个数据集上进行二值分类我认为比进行多类分类更简单。大多数情况下,学习二值分类更简单,但在这个问题中,一类中有5种不同类型的图片。i、 如果你有狗和猫,二元分类比多重分类更简单,因为猫和狗更相似(耳朵,口吻)。基本神经网络在这组数据中具有高精度。所以如果更复杂的网络有更低的,我认为问题只存在于2类。
Epoch 1: TrL=49.0955, TrA=31.4211, VL=49.7285, VA=31.7340, TeL=49.2635, TeA=31.3758,
Epoch 2: TrL=49.0992, TrA=31.4235, VL=49.7285, VA=31.7340, TeL=49.2635, TeA=31.3758,
Epoch 3: TrL=49.0899, TrA=31.4176, VL=49.7285, VA=31.7340, TeL=49.2635, TeA=31.3758,
Epoch 4: TrL=49.0936, TrA=31.4199, VL=49.7285, VA=31.7340, TeL=49.2635, TeA=31.3758,
Epoch 5: TrL=49.0936, TrA=31.4199, VL=49.7285, VA=31.7340, TeL=49.2635, TeA=31.3758,
Epoch 6: TrL=49.0825, TrA=31.4128, VL=49.7285, VA=31.7340, TeL=49.2635, TeA=31.3758,