在多类分类中，当从tensorflow 2.3.1降级到tensorflow 1.14或1.15时，由于过度拟合，精度性能降低_Tensorflow

在多类分类中，当从tensorflow 2.3.1降级到tensorflow 1.14或1.15时，由于过度拟合，精度性能降低

tensorflow

在多类分类中，当从tensorflow 2.3.1降级到tensorflow 1.14或1.15时，由于过度拟合，精度性能降低,tensorflow,Tensorflow,我在tensorflow2.x中制作了一个脚本，但我不得不将它下变频为tensorflow1.x（在1.14和1.15中测试）。但是，tf1版本的性能非常不同（测试集的准确率较低10%）。另请参见列车和验证性能图（下图随附）查看从tf1迁移到tf2所需的操作，似乎只有Adam学习率可能是个问题，但我正在明确定义它我在GPU、CPU和colab上本地复制了相同的行为。使用的keras是内置在tensorflow中的（tf.keras）。我使用了以下函数（用于训练、验证和测试），使用了稀疏分类（

我在

tensorflow2.x

中制作了一个脚本，但我不得不将它下变频为

tensorflow1.x

（在

1.14

和

1.15

中测试）。但是，

tf1

版本的性能非常不同（测试集的准确率较低10%）。另请参见列车和验证性能图（下图随附）

查看从

tf1

迁移到

tf2

所需的操作，似乎只有

Adam

学习率可能是个问题，但我正在明确定义它

我在GPU、CPU和colab上本地复制了相同的行为。使用的keras是内置在tensorflow中的（

tf.keras

）。我使用了以下函数（用于训练、验证和测试），使用了稀疏分类（整数）：

该模型是一个简单的resnet50，顶部有一个新层：

IMG_SHAPE = img_size+(3,)
inputs = Input(shape=IMG_SHAPE, name='image_input',dtype = tf.uint8)
x = tf.cast(inputs, tf.float32)

# not working in this version of keras. inserted in imageGenerator
x = preprocess_input_resnet50(x)

base_model = tf.keras.applications.ResNet50(
                                include_top=False, 
                                input_shape = IMG_SHAPE,
                                pooling=None,
                                weights='imagenet')
# Freeze the pretrained weights
base_model.trainable = False
x=base_model(x)

# Rebuild top
x = GlobalAveragePooling2D(data_format='channels_last',name="avg_pool")(x)
      
top_dropout_rate = 0.2
x = Dropout(top_dropout_rate, name="top_dropout")(x)
outputs = Dense(num_classes,activation="softmax", name="pred_out")(x)
model = Model(inputs=inputs, outputs=outputs,name="ResNet50_comp")

optimizer = tf.keras.optimizers.Adam(lr=learning_rate)
model.compile(optimizer=optimizer,
        loss="sparse_categorical_crossentropy",
        metrics=['accuracy'])

然后我调用fit函数：

history = model.fit_generator(train_dataset, 
                    steps_per_epoch=n_train_batches, 
                    validation_data=validation_dataset, 
                    validation_steps=n_val_batches,
                    epochs=initial_epochs,
                    verbose=1,
                    callbacks=[stopping])

例如，我用以下完整脚本复制了相同的行为（应用于我的数据集并更改为adam并删除了中间最终致密层）：

复制此行为的最简单方法是使用相同的脚本在

tf2

环境中启用或禁用以下行，并将以下行添加到其中。但是，我也在

tf1

环境中进行了测试（

1.14

和

1.15

）：

遗憾的是，我无法提供数据集

更新日期：2020年11月26日
为了实现完全的再现性，我通过food101（101个类别）数据集获得了类似的行为，该数据集使用“tf.compat.v1.disable_v2_behavior（）”启用tf1行为。以下是使用tensorflow gpu 2.2.0执行的脚本：

#%% ref https://medium.com/deeplearningsandbox/how-to-use-transfer-learning-and-fine-tuning-in-keras-and-tensorflow-to-build-an-image-recognition-94b0b02444f2 import os import sys import glob import argparse import matplotlib.pyplot as plt import tensorflow as tf # enable and disable this to obtain tf1 behaviour tf.compat.v1.disable_v2_behavior() from tensorflow.keras import __version__ from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input from tensorflow.keras.models import Model from tensorflow.keras.layers import Dense, GlobalAveragePooling2D from tensorflow.keras.optimizers import Adam # since i'm using resnet50 weights from imagenet, i'm using food101 for # similar but different categorization tasks # pip install tensorflow-datasets if tensorflow_dataset not found import tensorflow_datasets as tfds (train_ds,validation_ds),info= tfds.load('food101', split=['train','validation'], shuffle_files=True, with_info=True) assert isinstance(train_ds, tf.data.Dataset) print(train_ds) #%% IM_WIDTH, IM_HEIGHT = 224, 224 NB_EPOCHS = 10 BAT_SIZE = 32 def get_nb_files(directory): """Get number of files by searching directory recursively""" if not os.path.exists(directory): return 0 cnt = 0 for r, dirs, files in os.walk(directory): for dr in dirs: cnt += len(glob.glob(os.path.join(r, dr + "/*"))) return cnt def setup_to_transfer_learn(model, base_model): """Freeze all layers and compile the model""" for layer in base_model.layers: layer.trainable = False model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy']) def add_new_last_layer(base_model, nb_classes): """Add last layer to the convnet Args: base_model: keras model excluding top nb_classes: # of classes Returns: new keras model with last layer """ x = base_model.output x = GlobalAveragePooling2D()(x) #x = Dense(FC_SIZE, activation='relu')(x) #new FC layer, random init predictions = Dense(nb_classes, activation='softmax')(x) #new softmax layer model = Model(inputs=base_model.input, outputs=predictions) return model def train(nb_epoch, batch_size): """Use transfer learning and fine-tuning to train a network on a new dataset""" #nb_train_samples = train_ds.cardinality().numpy() nb_train_samples=info.splits['train'].num_examples nb_classes = info.features['label'].num_classes classes_names = info.features['label'].names #nb_val_samples = validation_ds.cardinality().numpy() nb_val_samples = info.splits['validation'].num_examples #nb_epoch = int(args.nb_epoch) #batch_size = int(args.batch_size) def preprocess(features): #print(features['image'], features['label']) image = tf.image.resize(features['image'], [224,224]) #image = tf.divide(image, 255) #print(image) # data augmentation image=tf.image.random_flip_left_right(image) image = preprocess_input(image) label = features['label'] # for categorical crossentropy #label = tf.one_hot(label,101,axis=-1) #return image, tf.cast(label, tf.float32) return image, label #pre-processing the dataset to fit a specific image size and 2D labelling train_generator = train_ds.map(preprocess).batch(batch_size).repeat() validation_generator = validation_ds.map(preprocess).batch(batch_size).repeat() #train_generator=train_ds #validation_generator=validation_ds #fig = tfds.show_examples(validation_generator, info) # setup model base_model = ResNet50(weights='imagenet', include_top=False) #include_top=False excludes final FC layer model = add_new_last_layer(base_model, nb_classes) # transfer learning setup_to_transfer_learn(model, base_model) history = model.fit( train_generator, epochs=nb_epoch, steps_per_epoch=nb_train_samples//BAT_SIZE, validation_data=validation_generator, validation_steps=nb_val_samples//BAT_SIZE) #class_weight='auto') #execute history = train(nb_epoch=NB_EPOCHS, batch_size=BAT_SIZE)
以及在food101数据集上的性能：
更新日期：2020年11月27日
也可以通过较小的牛津大学花卉102数据集看出差异：

(train_ds,validation_ds,test_ds),info= tfds.load('oxford_flowers102', split=['train','validation','test'], shuffle_files=True, with_info=True)

注：上图显示了通过多次运行相同的训练和evaluatind mean和std来检查对随机权重初始化和数据扩充的影响所给出的信心
此外，我在tf2上尝试了一些超参数调优，结果如下所示：

更改优化器（adam和rmsprop）

不应用水平翻转辅助

停用keras resnet50预处理输入

提前感谢你的每一个建议。以下是我的数据集上的
tf1
和
tf2
的准确性和验证性能：

更新日期：2020年12月14日
我在一个按钮的clic上分享牛津大学花卉的可复制性colab：
在执行相反的迁移（从TF1+Keras到TF2）时，我遇到了类似的情况
在下面运行此代码：

# using TF2 import numpy as np from tensorflow.keras.applications.resnet50 import ResNet50 fe = ResNet50(include_top=False, pooling="avg") out = fe.predict(np.ones((1,224,224,3))).flatten() sum(out) >>> 212.3205274187726 # using TF1+Keras import numpy as np from keras.applications.resnet50 import ResNet50 fe = ResNet50(include_top=False, pooling="avg") out = fe.predict(np.ones((1,224,224,3))).flatten() sum(out) >>> 187.23898954353717
您可以看到，不同版本的同一库中的同一模型不会返回相同的值（使用
sum
作为快速检查）。我在另一个答案中找到了这个神秘行为的答案：

我给您的另一个建议是，尝试从
应用程序.resnet50.resnet50
类内部使用
池，而不是函数中的附加层，为简单起见，并删除可能的问题生成器：）您好。请不要给出图片的链接，而是将图片放在里面。然而，我很想知道是什么让你将2降级为1？为什么你将class_模式设置为int ，而不是sparse？你能分享一些可复制的代码吗？亲爱的@M.Innat谢谢你的及时回复。稀疏是正确的配置，我已经在帖子中更正了它。谢谢你指出。我不能直接发布图像，直到我有10个信誉点（新的堆栈溢出）。对于可复制的代码，我链接了提供类似结果的深度学习沙盒代码（在我的数据集上）。很遗憾，我无法共享我的私有数据集，但我将来可能会在MNIST上尝试，并用完全可复制的代码更新您。明白。请与MNIST或任何虚拟集共享可复制代码。主要的问题可能更深，并且没有可复制的代码，很难为其他人调试。谢谢你的建议。添加了一个关于这个问题的新部分，将可复制材料应用于food101（这是我的场景中最相似的数据集）。 (train_ds,validation_ds,test_ds),info= tfds.load('oxford_flowers102', split=['train','validation','test'], shuffle_files=True, with_info=True) # using TF2 import numpy as np from tensorflow.keras.applications.resnet50 import ResNet50 fe = ResNet50(include_top=False, pooling="avg") out = fe.predict(np.ones((1,224,224,3))).flatten() sum(out) >>> 212.3205274187726 # using TF1+Keras import numpy as np from keras.applications.resnet50 import ResNet50 fe = ResNet50(include_top=False, pooling="avg") out = fe.predict(np.ones((1,224,224,3))).flatten() sum(out) >>> 187.23898954353717