Python 在尝试微调CNN模型时，我如何决定将合适的头部网络连接到模型？_Python_Machine Learning_Keras_Deep Learning_Computer Vision

Python 在尝试微调CNN模型时，我如何决定将合适的头部网络连接到模型？

python machine-learning keras deep-learning computer-vision

Python 在尝试微调CNN模型时，我如何决定将合适的头部网络连接到模型？,python,machine-learning,keras,deep-learning,computer-vision,Python,Machine Learning,Keras,Deep Learning,Computer Vision,我正在一个包含植物疾病症状图像的数据集上试验训练模型我从头开始训练VGG16模型，并使用迁移学习在迁移学习中，我移除了在imagenet数据集上预先训练的VGG16模型的头部。然后我把这个自定义头连接到它 from keras.layers.core import Dropout from keras.layers.core import Flatten from keras.layers.core import Dense head_model = base_model.output he

我正在一个包含植物疾病症状图像的数据集上试验训练模型

我从头开始训练VGG16模型，并使用迁移学习

在迁移学习中，我移除了在imagenet数据集上预先训练的VGG16模型的头部。然后我把这个自定义头连接到它

from keras.layers.core import Dropout
from keras.layers.core import Flatten
from keras.layers.core import Dense
head_model = base_model.output
head_model = Flatten(name='flatten')(head_model)
head_model = Dense(256,activation='relu')(head_model)
head_model = Dropout(0.5)(head_model)
# Add a softmaxc layer
head_model = Dense(len(class_names),activation='softmax')(head_model)

我冻结了基础模型中的所有图层，并对头部进行了大约25个时代的训练

[INFO] evaluating after initialization...
                                        precision    recall  f1-score   support

          Tomato___Tomato_mosaic_virus       0.00      0.00      0.00       532
                 Tomato___Early_blight       0.00      0.00      0.00       239
                  Tomato___Late_blight       0.00      0.00      0.00       470
                    Tomato___Leaf_Mold       0.00      0.00      0.00       238
               Tomato___Bacterial_spot       0.00      0.00      0.00       435
                  Tomato___Target_Spot       0.00      0.00      0.00       362
Tomato___Tomato_Yellow_Leaf_Curl_Virus       0.30      1.00      0.46      1355
                      Tomato___healthy       0.00      0.00      0.00        98
           Tomato___Septoria_leaf_spot       0.00      0.00      0.00       414
      Tomato___Two-spotted_spider_mite       0.00      0.00      0.00       397

                           avg / total       0.09      0.30      0.14      4540

然后，我在基本模型的末尾解冻了一些图层，并为下一个100个时代进行了训练

这比从头开始的训练更准确

我也想用ResNet50模型做实验

我的问题是如何确定要连接的合适头部部分？对于以上内容，我从一个教程中获得了头部架构。但我真的不明白背后的原因。例如，没有使用CONV层。只有致密、扁平和脱落。为什么不使用CONV层

我如何为ResNet选择合适的头

编辑

我每堂课有100-1500张图片。一共有10节课

resnet的训练精度

热身后。在这里，我冻结了所有的基本模型层，只训练了25个时代的自定义头部

[INFO] evaluating after initialization...
                                        precision    recall  f1-score   support

          Tomato___Tomato_mosaic_virus       0.00      0.00      0.00       532
                 Tomato___Early_blight       0.00      0.00      0.00       239
                  Tomato___Late_blight       0.00      0.00      0.00       470
                    Tomato___Leaf_Mold       0.00      0.00      0.00       238
               Tomato___Bacterial_spot       0.00      0.00      0.00       435
                  Tomato___Target_Spot       0.00      0.00      0.00       362
Tomato___Tomato_Yellow_Leaf_Curl_Virus       0.30      1.00      0.46      1355
                      Tomato___healthy       0.00      0.00      0.00        98
           Tomato___Septoria_leaf_spot       0.00      0.00      0.00       414
      Tomato___Two-spotted_spider_mite       0.00      0.00      0.00       397

                           avg / total       0.09      0.30      0.14      4540

热身结束后，我试着逐渐解冻一些层。这些就是结果

从165层开始的层未冻结。（跑了60个时代）

161层未冻结（运行了大约50个时代）

168层未冻结（运行50个历元） val_精度=0.30198

因此，val_精度很少增加。而且它也减少了

相比之下，vgg16的精确度非常高。

任何定义的CNN模型，如vgg16、Resnet、Inception等，都有一个经过良好测试的架构，基于数小时的训练和实验。因此，通常不需要更改当前的体系结构或向其添加任何更多的CONV层。唯一需要的区别是完全连接的层或简单的“头部”的变化。由于头部仅处理数字，因此在大多数情况下，最多2个致密层就足够了。我观察到，如果我们使用2个以上的密集层，模型会变得很重（训练时间和内存会增加）

选择如何为ResNet？或任何其他知名架构选择合适的头部是非常主观的。尽量不要更改架构，而是使用头节点。我更喜欢使用GlobalAveragePoolig2D（）
与完全连接的层相比，全局平均池的一个优点是，通过强制特征映射和类别之间的对应，它更适合卷积结构。因此，特征映射可以很容易地解释为类别置信度映射
而完全连接的层容易过度拟合，从而影响整个网络的泛化能力

def add_new_last_layer(base_model, nb_classes): x = base_model.output x = GlobalAveragePooling2D()(x) x = Dense(1024)(x) x = Activation('relu')(x) x = Dropout(0.3)(x) x = Dense(512)(x) x = Activation('relu')(x) predictions = Dense(nb_classes, activation='softmax')(x) # Creating final model model = Model(inputs=base_model.input, outputs=predictions) return model

你好，谢谢你的回复。vgg16模型的验证精度为0.96+，水头高于。我对头部进行了25个阶段的热身，然后从15个阶段开始解冻，再进行100个阶段的训练。但是，当我尝试使用resnet50执行相同操作时，验证精度不会超过0.4。甚至在解冻158层之后。为什么不同？我应该在顶部解冻更多的层吗？在这种情况下，我需要更多的信息，比如在resnet50的情况下，训练的准确度是多少？。图像大小是多少？有多少张训练图像？看起来你的头已经过胖了。我可以建议您在不冻结任何图层的情况下完全培训您的ResNet50模型吗？这可能需要一些时间。由于VGG16运行良好，我认为ResNet50太大了。我假设您的图像是RGB，并且至少是224X224。如果不是ResNet50，您可以尝试ResNet18I将图像大小调整为224x224。它们原来比那个大。那么，我应该一次解冻所有resnet层吗？是的，我们可以试试。而且我不认为有必要对这些层进行预热。作为其迁移学习。
def add_new_last_layer(base_model, nb_classes): x = base_model.output x = GlobalAveragePooling2D()(x) x = Dense(1024)(x) x = Activation('relu')(x) x = Dropout(0.3)(x) x = Dense(512)(x) x = Activation('relu')(x) predictions = Dense(nb_classes, activation='softmax')(x) # Creating final model model = Model(inputs=base_model.input, outputs=predictions) return model