Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/324.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pytorch vs.Keras:Pytorch车型过度贴合_Python_Keras_Pytorch - Fatal编程技术网

Python Pytorch vs.Keras:Pytorch车型过度贴合

Python Pytorch vs.Keras:Pytorch车型过度贴合,python,keras,pytorch,Python,Keras,Pytorch,几天来,我一直在尝试用pytorch复制我的keras训练结果。无论我做什么,pytorch模型都将比keras模型更早、更强地过拟合验证集。对于pytorch,我使用了来自的相同异常代码 数据加载、扩充、验证、培训计划等是等效的。我错过了什么明显的东西吗?一定有个普遍的问题。我尝试了数千个不同的模块星座,但似乎没有什么能接近keras训练。有人能帮忙吗 Keras模型:val精度>90% # base model base_model = applications.Xception(weigh

几天来,我一直在尝试用pytorch复制我的keras训练结果。无论我做什么,pytorch模型都将比keras模型更早、更强地过拟合验证集。对于pytorch,我使用了来自的相同异常代码

数据加载、扩充、验证、培训计划等是等效的。我错过了什么明显的东西吗?一定有个普遍的问题。我尝试了数千个不同的模块星座,但似乎没有什么能接近keras训练。有人能帮忙吗

Keras模型:val精度>90%

# base model
base_model = applications.Xception(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))

# top model
x = base_model.output
x = GlobalMaxPooling2D()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(4, activation='softmax')(x)

# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)

# Compile model
from keras import optimizers
adam = optimizers.Adam(lr=0.0001)
model.compile(loss='categorical_crossentropy', 
optimizer=adam, metrics=['accuracy'])

# LROnPlateau etc. with equivalent settings as pytorch
Pytorch型号:val精度~81%

from xception import xception
import torch.nn.functional as F

# modified from https://github.com/Cadene/pretrained-models.pytorch
class XCeption(nn.Module):
    def __init__(self, num_classes):
        super(XCeption, self).__init__()

        original_model = xception(pretrained="imagenet")

        self.features=nn.Sequential(*list(original_model.children())[:-1])
        self.last_linear = nn.Sequential(
             nn.Linear(original_model.last_linear.in_features, 512),
             nn.ReLU(),
             nn.Dropout(p=0.5),
             nn.Linear(512, num_classes)
        )

    def logits(self, features):
        x = F.relu(features)
        x = F.adaptive_max_pool2d(x, (1, 1))
        x = x.view(x.size(0), -1)
        x = self.last_linear(x)
        return x

    def forward(self, input):
        x = self.features(input)
        x = self.logits(x)
        return x 

device = torch.device("cuda")
model=XCeption(len(class_names))
if torch.cuda.device_count() > 1:
    print("Let's use", torch.cuda.device_count(), "GPUs!")
    # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
    model = nn.DataParallel(model)
model.to(device)

criterion = nn.CrossEntropyLoss(size_average=False)
optimizer = optim.Adam(model.parameters(), lr=0.0001)
scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.2, patience=5, cooldown=5)
多谢各位

更新: 设置:

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)
scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.2, patience=5, cooldown=5)

model = train_model(model, train_loader, val_loader, 
                        criterion, optimizer, scheduler, 
                        batch_size, trainmult=8, valmult=10, 
                        num_epochs=200, epochs_top=0)
培训功能:

def train_model(model, train_loader, val_loader, criterion, optimizer, scheduler, batch_size, trainmult=1, valmult=1, num_epochs=None, epochs_top=0):
  for epoch in range(num_epochs):                        
    for phase in ['train', 'val']:
        running_loss = 0.0
        running_acc = 0
        total = 0
        # Iterate over data.
        if phase=="train":
            model.train(True)  # Set model to training mode
            for i in range(trainmult):
                for data in train_loader:
                    # get the inputs
                    inputs, labels = data
                    inputs, labels = inputs.to(torch.device("cuda")), labels.to(torch.device("cuda"))
                    # zero the parameter gradients
                    optimizer.zero_grad()
                    # forward
                    outputs = model(inputs) # notinception
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)
                    # backward + optimize only if in training phase
                    loss.backward()
                    optimizer.step()
                    # statistics                      
                    total += labels.size(0)
                    running_loss += loss.item()*labels.size(0)
                    running_acc += torch.sum(preds == labels)
                    train_loss=(running_loss/total)
                    train_acc=(running_acc.double()/total)
        else:
            model.train(False)  # Set model to evaluate mode
            with torch.no_grad():
                for i in range(valmult):
                    for data in val_loader:
                        # get the inputs
                        inputs, labels = data
                        inputs, labels = inputs.to(torch.device("cuda")), labels.to(torch.device("cuda"))
                        # zero the parameter gradients
                        optimizer.zero_grad()
                        # forward
                        outputs = model(inputs)
                        _, preds = torch.max(outputs, 1)
                        loss = criterion(outputs, labels.data)
                        # statistics
                        total += labels.size(0)
                        running_loss += loss.item()*labels.size(0)
                        running_acc += torch.sum(preds == labels)
                        val_loss=(running_loss/total)
                        val_acc=(running_acc.double()/total)  
            scheduler.step(val_loss)
    return model

这可能是因为您正在使用的权重初始化类型 否则就不应该发生这种情况 在两个模型中尝试使用相同的初始值设定项

self.features=nn.Sequential(*list(original_model.children())[:-1])
您确定这一行以完全相同的方式重新实例化了您的模型吗?您使用的是NN.Sequential函数,而不是原始的异常模型的forward函数。如果转发函数中有任何东西与使用nn.Sequential不完全相同,它将不会重现相同的性能

您可以更改它,而不是将其按顺序包装

my_model=Xception()
#在更改体系结构之前加载权重
my_模型=加载权重(路径到权重)
#用您自己的覆盖原始的最后一个\u线性
my_model.last_linear=nn.Sequential(
nn.线性(原始模型、最后一个线性特征、512),
nn.ReLU(),
nn.辍学率(p=0.5),
nn.线性(512,num_类)
)

这只是一个猜测。你检查过init方法了吗?我不确定keras中使用了什么init,但是使用默认的init方法,pytorch中的权重值可能会增长很大-这可能会导致更快的学习。你能检查一下培训acc中的差异吗?pytorch中的辍学情况存在差异,即辍学与辍学2D。Dropout 2D会删除整个图像通道,而会删除特定像素。有没有可能这就是问题所在?也许pytorch数据加载器没有在keras数据加载器的情况下洗牌训练批?我以前只看到model.eval(),没有看到mode.train(False)。还有,Kevinj22有一个很好的观点,你在trainloader中是否通过了shuffle=True?我认为ADAM的默认设置在Keras和PyTorch之间有所不同!如果你没有训练任何一个模型,你会在验证过程中得到同样的损失吗?不同的损失会导致不同的体系结构,问题与培训无关。