Machine learning Conv神经网络的损失没有减少,而是逐渐淘汰

Machine learning Conv神经网络的损失没有减少,而是逐渐淘汰,machine-learning,neural-network,pytorch,conv-neural-network,Machine Learning,Neural Network,Pytorch,Conv Neural Network,我有一个vgg架构的卷积神经网络“风格”(下图)来分类图片上是否有猫或狗。我的训练集包含25000张图像,每边裁剪为256px。我尝试了不同的学习率、不同的损失函数等等,但我的损失一直在0.692和0.694之间波动,但不会减少 normalize = transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) transform = transforms.Compose([

我有一个vgg架构的卷积神经网络“风格”(下图)来分类图片上是否有猫或狗。我的训练集包含25000张图像,每边裁剪为256px。我尝试了不同的学习率、不同的损失函数等等,但我的损失一直在0.692和0.694之间波动,但不会减少

normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406],
    std=[0.229, 0.224, 0.225]
)

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(256),
    transforms.ToTensor(),
    normalize
])

# Output = [isDog, isCat]
train_data_list = []
train_data = []
target_list = []
plotlist = []
train_files = listdir("data/catsdogs/train/")


def loadTrainData():
    global train_data_list
    global train_data
    global target_list
    print("Loading data now...")

    amount = len(listdir("data/catsdogs/train/"))
    current = 0

    for i in range(amount):
        r = random.randint(0, len(train_files) - 1)
        file = train_files[r]
        train_files.remove(file)

        img = Image.open("data/catsdogs/train/" + file)
        img_tensor = transform(img)  # (3, 256, 256)

        isCat = 1 if 'cat' in file else 0
        isDog = 1 if 'dog' in file else 0
        target = [isCat, isDog]

        train_data_list.append(img_tensor)
        target_list.append(target)

        if len(train_data_list) >= 64:
            train_data.append((torch.stack(train_data_list), target_list))
            train_data_list = []
            target_list = []

        current = current + 1
        print("Loaded: {:.1f}%".format(current * 100 / amount))
    print("Loaded data successfully!")


class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1)

        self.conv3 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.conv4 = nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1)

        self.conv5 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.conv6 = nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1)

        self.dropout = nn.Dropout2d()
        self.relu = nn.ReLU(inplace=True)
        self.pool = nn.MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)

        self.fc1 = nn.Linear(5184, 1296)
        self.fc2 = nn.Linear(1296, 2)

    def forward(self, x):
        # Block 1
        x = self.conv1(x)
        x = self.relu(x)

        x = self.conv2(x)
        x = self.relu(x)

        x = self.pool(x)

        # Block 2
        x = self.conv3(x)
        x = self.relu(x)

        x = self.conv4(x)
        x = self.relu(x)

        x = self.pool(x)

        # Block 3
        x = self.conv5(x)
        x = self.relu(x)

        x = self.conv6(x)
        x = self.relu(x)

        x = self.pool(x)

        x = x.view(-1, 5184)

        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)

        return torch.sigmoid(x)


model = Network()
model = model.cuda()

optimizer = optim.SGD(model.parameters(), lr=0.0001, weight_decay=0.0016)


def train(epoch):
    global optimizer

    model.train()
    batch_id = 0
    for data, target in train_data:
        data = data.cuda()
        target = torch.Tensor(target).cuda()

        data = Variable(data)
        target = Variable(target)

        optimizer.zero_grad()

        out = model(data)
        criterion = F.binary_cross_entropy

        loss = criterion(out, target)
        loss.backward()

        optimizer.step()

        plotlist.append(loss)

        print('Train Epoch: {},  {:.0f}% ,\tLoss: {:.6f}'.format(
            epoch, 100. * batch_id / len(train_data), loss.item()
        ))
        batch_id = batch_id + 1


loadTrainData()

for epoch in range(25):
    train(epoch)

plt.plot(plotlist)
plt.show()
plt.ylabel("Loss")
plt.savefig("lossPlot.png")
以下是我在5次迭代中的损失图:


此外,学习率越高,波动越大,0.1 lr介于0.5和0.7之间。

您是否尝试过为SGD优化器添加动力

optimizer = optim.SGD(model.parameters(), lr=0.1, weight_decay=0.0016, momentum=0.9)
或者,另一个优化器,如Adam或ADADDelta,将使用自适应学习率

此外,看起来你的训练数据并没有被洗牌——有些批次的猫和狗都是猫,每隔几步就以相反的方向拉动梯度下降,这会发生吗?最好在每个历元之后洗牌您的训练数据,并在此基础上进行批处理
torch.utils.data.DataLoader类在这方面可能会有所帮助

你的文件的命名方案是什么?变量“isCat”和“isDog”的值是否正确


当您尝试仅使用100个示例进行训练时会发生什么情况?在这个简单的例子中,您的模型是否能够学习训练数据?这应该有希望排除一些明显的错误。

看起来变量在这里是正确的调用

当我把它取下来,把亚当放进去的时候,我想它工作得很好。 损失曲线看起来要好得多,我想我可以看到训练过度拟合网络的情况。 但这一损失仍在严重恶化。。。

尝试使用较少的lr(如e-3)从优化器中移除
weight\u decay
,如果这对
1e-4
和单批数据的过度拟合没有帮助,而不是
25k
示例。顺便说一句,
变量
已弃用,请勿使用。