Python 重新训练自定义VGFace模型会产生随机结果

Python 重新训练自定义VGFace模型会产生随机结果,python,keras,conv-neural-network,vgg-net,keras-vggface,Python,Keras,Conv Neural Network,Vgg Net,Keras Vggface,我试图比较一个使用VGFace权重的微调VGFace模型和一个完全重新训练的模型。当我使用经过微调的模型时,我得到了不错的准确度分数。然而,当我通过解冻权重重新训练整个模型时,精度变得接近随机 我在猜测是否是因为使用了小数据集?我知道VGGFace是在数百万个样本上训练的,我的数据集只有1400个样本,每个类对应700个二元分类问题。但我只是想确定我是否正确地将VGFace模型与自定义模型结合在一起。也可能是因为学习速度太快 使用以下代码设置模型 def Train_VGG_Model(trai

我试图比较一个使用VGFace权重的微调VGFace模型和一个完全重新训练的模型。当我使用经过微调的模型时,我得到了不错的准确度分数。然而,当我通过解冻权重重新训练整个模型时,精度变得接近随机

我在猜测是否是因为使用了小数据集?我知道VGGFace是在数百万个样本上训练的,我的数据集只有1400个样本,每个类对应700个二元分类问题。但我只是想确定我是否正确地将VGFace模型与自定义模型结合在一起。也可能是因为学习速度太快

使用以下代码设置模型

def Train_VGG_Model(train_layers=False):
    print('='*65);K.clear_session()
    vggface_model=VGGFace(model='vgg16')
    x=vggface_model.get_layer('fc7/relu').output
    x=Dense(512,name='custom_fc8')(x)
    x=Activation('relu',name='custom_fc8/relu')(x)
    x=Dense(64,name='custom_fc9')(x)
    x=Activation('relu',name='custom_fc9/relu')(x)
    x=Dense(1,name='custom_fc10')(x)
    out=Activation('sigmoid',name='custom_fc10/sigmoid')(x)
    custom_model=Model(vggface_model.input,out,
                       name='Custom VGGFace Model')
    for layer in custom_model.layers:
        if 'custom_' not in layer.name:
            layer.trainable=train_layers
        elif 'custom_' in layer.name:
            layer.trainable=True
        print('Layer name:',layer.name,
              '... Trainable:',layer.trainable)
    print('='*65);opt=optimizers.Adam(lr=1e-5)
    custom_model.compile(loss='binary_crossentropy',
                         metrics=['accuracy'],
                         optimizer=opt')
    custom_model.summary()
    return custom_model

callbacks=[EarlyStopping(monitor='val_loss',patience=1,mode='auto')]
model=Train_VGG_Model(train_layers=train_layers)
model.fit(X_train,y_train,batch_size=32,epochs=100,
callbacks=callbacks,validation_data=(X_valid,y_valid))
产出:

Layer name: input_1 ... Trainable: True
Layer name: conv1_1 ... Trainable: True
Layer name: conv1_2 ... Trainable: True
Layer name: pool1 ... Trainable: True
Layer name: conv2_1 ... Trainable: True
Layer name: conv2_2 ... Trainable: True
Layer name: pool2 ... Trainable: True
Layer name: conv3_1 ... Trainable: True
Layer name: conv3_2 ... Trainable: True
Layer name: conv3_3 ... Trainable: True
Layer name: pool3 ... Trainable: True
Layer name: conv4_1 ... Trainable: True
Layer name: conv4_2 ... Trainable: True
Layer name: conv4_3 ... Trainable: True
Layer name: pool4 ... Trainable: True
Layer name: conv5_1 ... Trainable: True
Layer name: conv5_2 ... Trainable: True
Layer name: conv5_3 ... Trainable: True
Layer name: pool5 ... Trainable: True
Layer name: flatten ... Trainable: True
Layer name: fc6 ... Trainable: True
Layer name: fc6/relu ... Trainable: True
Layer name: fc7 ... Trainable: True
Layer name: fc7/relu ... Trainable: True
Layer name: custom_fc8 ... Trainable: True
Layer name: custom_fc8/relu ... Trainable: True
Layer name: custom_fc9 ... Trainable: True
Layer name: custom_fc9/relu ... Trainable: True
Layer name: custom_fc10 ... Trainable: True
Layer name: custom_fc10/sigmoid ... Trainable: True
=================================================================
Model: "Custom VGGFace Model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
conv1_1 (Conv2D)             (None, 224, 224, 64)      1792      
_________________________________________________________________
conv1_2 (Conv2D)             (None, 224, 224, 64)      36928     
_________________________________________________________________
pool1 (MaxPooling2D)         (None, 112, 112, 64)      0         
_________________________________________________________________
conv2_1 (Conv2D)             (None, 112, 112, 128)     73856     
_________________________________________________________________
conv2_2 (Conv2D)             (None, 112, 112, 128)     147584    
_________________________________________________________________
pool2 (MaxPooling2D)         (None, 56, 56, 128)       0         
_________________________________________________________________
conv3_1 (Conv2D)             (None, 56, 56, 256)       295168    
_________________________________________________________________
conv3_2 (Conv2D)             (None, 56, 56, 256)       590080    
_________________________________________________________________
conv3_3 (Conv2D)             (None, 56, 56, 256)       590080    
_________________________________________________________________
pool3 (MaxPooling2D)         (None, 28, 28, 256)       0         
_________________________________________________________________
conv4_1 (Conv2D)             (None, 28, 28, 512)       1180160   
_________________________________________________________________
conv4_2 (Conv2D)             (None, 28, 28, 512)       2359808   
_________________________________________________________________
conv4_3 (Conv2D)             (None, 28, 28, 512)       2359808   
_________________________________________________________________
pool4 (MaxPooling2D)         (None, 14, 14, 512)       0         
_________________________________________________________________
conv5_1 (Conv2D)             (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv5_2 (Conv2D)             (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv5_3 (Conv2D)             (None, 14, 14, 512)       2359808   
_________________________________________________________________
pool5 (MaxPooling2D)         (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc6 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc6/relu (Activation)        (None, 4096)              0         
_________________________________________________________________
fc7 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
fc7/relu (Activation)        (None, 4096)              0         
_________________________________________________________________
custom_fc8 (Dense)           (None, 512)               2097664   
_________________________________________________________________
custom_fc8/relu (Activation) (None, 512)               0         
_________________________________________________________________
custom_fc9 (Dense)           (None, 64)                32832     
_________________________________________________________________
custom_fc9/relu (Activation) (None, 64)                0         
_________________________________________________________________
custom_fc10 (Dense)          (None, 1)                 65        
_________________________________________________________________
custom_fc10/sigmoid (Activat (None, 1)                 0         
=================================================================
Total params: 136,391,105
Trainable params: 136,391,105
Non-trainable params: 0
_________________________________________________________________
Train on 784 samples, validate on 336 samples
Epoch 1/100
784/784 [==============================] - 235s 300ms/step - loss: 0.7987 - accuracy: 0.5051 - val_loss: 0.6932 - val_accuracy: 0.5149
Epoch 2/100
784/784 [==============================] - 233s 298ms/step - loss: 0.6935 - accuracy: 0.4605 - val_loss: 0.6932 - val_accuracy: 0.4792
Epoch 3/100
784/784 [==============================] - 236s 301ms/step - loss: 0.6932 - accuracy: 0.5089 - val_loss: 0.6932 - val_accuracy: 0.4792
280/280 [==============================] - 12s 45ms/step

提前谢谢,如果我的问题没有道理,请原谅。我对此很陌生。

如果你已经有了一个很好的权重,并使用足够大的数据集进行了训练,那么最好只对最后几层进行微调/训练,并冻结前几层

对于任何conv NN,初始层作为特征提取器工作,一个好的预先训练的模型已经学习了足够好的数据集的最佳特征

一旦你尝试重新训练整个模型,你就把所有东西都扔掉了。该模型将尝试向您拥有的新数据集转移,可能它比原始数据集更小,并且没有良好的分布。这可能会使模型性能不佳


如果您真的想训练整个模型,您可以尝试的另一件事是,对于初始层,选择一个非常小的学习率1e-5到1e-6,对于最后一层,选择类似1e-3的学习率

谢谢你的建议!实际上,我试着从Keras而不是VGFace从头开始重新训练整个VGG16模型,结果很好,即高于机会水平。VGG16和VGGFace应该基于相同的VGG16架构,所以我不明白为什么在Keras中加载VGG16模型时效果很好,但在这里,当我从VGGFace加载VGG16模型时效果不好。可能是因为重量初始化吗?也就是说,当我解冻砝码时,它不是扔掉原来的砝码吗?谢谢因此,我再次以较小的学习率进行训练,效果良好。我也试过用GPU代替CPU,但不确定这是否会产生不同的想法。是的,这是有意义的。GPU可以起到作用,但主要是通过减少训练时间,在某些情况下,它也可以带来更好的性能,因为GPU算法的编写方式与CPU算法略有不同。