在多类分类中,当从tensorflow 2.3.1降级到tensorflow 1.14或1.15时,由于过度拟合,精度性能降低
我在在多类分类中,当从tensorflow 2.3.1降级到tensorflow 1.14或1.15时,由于过度拟合,精度性能降低,tensorflow,Tensorflow,我在tensorflow2.x中制作了一个脚本,但我不得不将它下变频为tensorflow1.x(在1.14和1.15中测试)。但是,tf1版本的性能非常不同(测试集的准确率较低10%)。另请参见列车和验证性能图(下图随附) 查看从tf1迁移到tf2所需的操作,似乎只有Adam学习率可能是个问题,但我正在明确定义它 我在GPU、CPU和colab上本地复制了相同的行为。使用的keras是内置在tensorflow中的(tf.keras)。我使用了以下函数(用于训练、验证和测试),使用了稀疏分类(
tensorflow2.x
中制作了一个脚本,但我不得不将它下变频为tensorflow1.x
(在1.14
和1.15
中测试)。但是,tf1
版本的性能非常不同(测试集的准确率较低10%)。另请参见列车和验证性能图(下图随附)
查看从tf1
迁移到tf2
所需的操作,似乎只有Adam
学习率可能是个问题,但我正在明确定义它
我在GPU、CPU和colab上本地复制了相同的行为。使用的keras是内置在tensorflow中的(tf.keras
)。我使用了以下函数(用于训练、验证和测试),使用了稀疏分类(整数):
该模型是一个简单的resnet50,顶部有一个新层:
IMG_SHAPE = img_size+(3,)
inputs = Input(shape=IMG_SHAPE, name='image_input',dtype = tf.uint8)
x = tf.cast(inputs, tf.float32)
# not working in this version of keras. inserted in imageGenerator
x = preprocess_input_resnet50(x)
base_model = tf.keras.applications.ResNet50(
include_top=False,
input_shape = IMG_SHAPE,
pooling=None,
weights='imagenet')
# Freeze the pretrained weights
base_model.trainable = False
x=base_model(x)
# Rebuild top
x = GlobalAveragePooling2D(data_format='channels_last',name="avg_pool")(x)
top_dropout_rate = 0.2
x = Dropout(top_dropout_rate, name="top_dropout")(x)
outputs = Dense(num_classes,activation="softmax", name="pred_out")(x)
model = Model(inputs=inputs, outputs=outputs,name="ResNet50_comp")
optimizer = tf.keras.optimizers.Adam(lr=learning_rate)
model.compile(optimizer=optimizer,
loss="sparse_categorical_crossentropy",
metrics=['accuracy'])
然后我调用fit函数:
history = model.fit_generator(train_dataset,
steps_per_epoch=n_train_batches,
validation_data=validation_dataset,
validation_steps=n_val_batches,
epochs=initial_epochs,
verbose=1,
callbacks=[stopping])
例如,我用以下完整脚本复制了相同的行为(应用于我的数据集并更改为adam并删除了中间最终致密层):
复制此行为的最简单方法是使用相同的脚本在tf2
环境中启用或禁用以下行,并将以下行添加到其中。但是,我也在tf1
环境中进行了测试(1.14
和1.15
):
遗憾的是,我无法提供数据集
更新日期:2020年11月26日
为了实现完全的再现性,我通过food101(101个类别)数据集获得了类似的行为,该数据集使用“tf.compat.v1.disable_v2_behavior()”启用tf1行为。以下是使用tensorflow gpu 2.2.0执行的脚本:
#%% ref https://medium.com/deeplearningsandbox/how-to-use-transfer-learning-and-fine-tuning-in-keras-and-tensorflow-to-build-an-image-recognition-94b0b02444f2
import os
import sys
import glob
import argparse
import matplotlib.pyplot as plt
import tensorflow as tf
# enable and disable this to obtain tf1 behaviour
tf.compat.v1.disable_v2_behavior()
from tensorflow.keras import __version__
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
# since i'm using resnet50 weights from imagenet, i'm using food101 for
# similar but different categorization tasks
# pip install tensorflow-datasets if tensorflow_dataset not found
import tensorflow_datasets as tfds
(train_ds,validation_ds),info= tfds.load('food101', split=['train','validation'], shuffle_files=True, with_info=True)
assert isinstance(train_ds, tf.data.Dataset)
print(train_ds)
#%%
IM_WIDTH, IM_HEIGHT = 224, 224
NB_EPOCHS = 10
BAT_SIZE = 32
def get_nb_files(directory):
"""Get number of files by searching directory recursively"""
if not os.path.exists(directory):
return 0
cnt = 0
for r, dirs, files in os.walk(directory):
for dr in dirs:
cnt += len(glob.glob(os.path.join(r, dr + "/*")))
return cnt
def setup_to_transfer_learn(model, base_model):
"""Freeze all layers and compile the model"""
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
def add_new_last_layer(base_model, nb_classes):
"""Add last layer to the convnet
Args:
base_model: keras model excluding top
nb_classes: # of classes
Returns:
new keras model with last layer
"""
x = base_model.output
x = GlobalAveragePooling2D()(x)
#x = Dense(FC_SIZE, activation='relu')(x) #new FC layer, random init
predictions = Dense(nb_classes, activation='softmax')(x) #new softmax layer
model = Model(inputs=base_model.input, outputs=predictions)
return model
def train(nb_epoch, batch_size):
"""Use transfer learning and fine-tuning to train a network on a new dataset"""
#nb_train_samples = train_ds.cardinality().numpy()
nb_train_samples=info.splits['train'].num_examples
nb_classes = info.features['label'].num_classes
classes_names = info.features['label'].names
#nb_val_samples = validation_ds.cardinality().numpy()
nb_val_samples = info.splits['validation'].num_examples
#nb_epoch = int(args.nb_epoch)
#batch_size = int(args.batch_size)
def preprocess(features):
#print(features['image'], features['label'])
image = tf.image.resize(features['image'], [224,224])
#image = tf.divide(image, 255)
#print(image)
# data augmentation
image=tf.image.random_flip_left_right(image)
image = preprocess_input(image)
label = features['label']
# for categorical crossentropy
#label = tf.one_hot(label,101,axis=-1)
#return image, tf.cast(label, tf.float32)
return image, label
#pre-processing the dataset to fit a specific image size and 2D labelling
train_generator = train_ds.map(preprocess).batch(batch_size).repeat()
validation_generator = validation_ds.map(preprocess).batch(batch_size).repeat()
#train_generator=train_ds
#validation_generator=validation_ds
#fig = tfds.show_examples(validation_generator, info)
# setup model
base_model = ResNet50(weights='imagenet', include_top=False) #include_top=False excludes final FC layer
model = add_new_last_layer(base_model, nb_classes)
# transfer learning
setup_to_transfer_learn(model, base_model)
history = model.fit(
train_generator,
epochs=nb_epoch,
steps_per_epoch=nb_train_samples//BAT_SIZE,
validation_data=validation_generator,
validation_steps=nb_val_samples//BAT_SIZE)
#class_weight='auto')
#execute
history = train(nb_epoch=NB_EPOCHS, batch_size=BAT_SIZE)
以及在food101数据集上的性能:
更新日期:2020年11月27日
也可以通过较小的牛津大学花卉102数据集看出差异:
(train_ds,validation_ds,test_ds),info= tfds.load('oxford_flowers102', split=['train','validation','test'], shuffle_files=True, with_info=True)
注:上图显示了通过多次运行相同的训练和evaluatind mean和std来检查对随机权重初始化和数据扩充的影响所给出的信心
此外,我在tf2上尝试了一些超参数调优,结果如下所示:
- 更改优化器(adam和rmsprop)
- 不应用水平翻转辅助
- 停用keras resnet50预处理输入
tf1
和tf2
的准确性和验证性能:
更新日期:2020年12月14日
我在一个按钮的clic上分享牛津大学花卉的可复制性colab:
在执行相反的迁移(从TF1+Keras到TF2)时,我遇到了类似的情况 在下面运行此代码:
# using TF2
import numpy as np
from tensorflow.keras.applications.resnet50 import ResNet50
fe = ResNet50(include_top=False, pooling="avg")
out = fe.predict(np.ones((1,224,224,3))).flatten()
sum(out)
>>> 212.3205274187726
# using TF1+Keras
import numpy as np
from keras.applications.resnet50 import ResNet50
fe = ResNet50(include_top=False, pooling="avg")
out = fe.predict(np.ones((1,224,224,3))).flatten()
sum(out)
>>> 187.23898954353717
您可以看到,不同版本的同一库中的同一模型不会返回相同的值(使用sum
作为快速检查)。我在另一个答案中找到了这个神秘行为的答案:
我给您的另一个建议是,尝试从
应用程序.resnet50.resnet50
类内部使用池,而不是函数中的附加层,为简单起见,并删除可能的问题生成器:)您好。请不要给出图片的链接,而是将图片放在里面。然而,我很想知道是什么让你将2降级为1?为什么你将class_模式
设置为int
,而不是sparse
?你能分享一些可复制的代码吗?亲爱的@M.Innat谢谢你的及时回复。稀疏是正确的配置,我已经在帖子中更正了它。谢谢你指出。我不能直接发布图像,直到我有10个信誉点(新的堆栈溢出)。对于可复制的代码,我链接了提供类似结果的深度学习沙盒代码(在我的数据集上)。很遗憾,我无法共享我的私有数据集,但我将来可能会在MNIST上尝试,并用完全可复制的代码更新您。明白。请与MNIST或任何虚拟集共享可复制代码。主要的问题可能更深,并且没有可复制的代码,很难为其他人调试。谢谢你的建议。添加了一个关于这个问题的新部分,将可复制材料应用于food101(这是我的场景中最相似的数据集)。
(train_ds,validation_ds,test_ds),info= tfds.load('oxford_flowers102', split=['train','validation','test'], shuffle_files=True, with_info=True)
# using TF2
import numpy as np
from tensorflow.keras.applications.resnet50 import ResNet50
fe = ResNet50(include_top=False, pooling="avg")
out = fe.predict(np.ones((1,224,224,3))).flatten()
sum(out)
>>> 212.3205274187726
# using TF1+Keras
import numpy as np
from keras.applications.resnet50 import ResNet50
fe = ResNet50(include_top=False, pooling="avg")
out = fe.predict(np.ones((1,224,224,3))).flatten()
sum(out)
>>> 187.23898954353717