Python 我自己的FastRCNN实现无法在平衡数据上运行良好
2020.06.09 共有700幅图像用于训练,每幅图像提取64个ROI并进行一个小批量,当批量大小设置为2时,需要花费350个步骤来完成训练,但对于RCNN,每个目标提取为单个图像,大小调整为224*224,将有64*700=44800幅图像,每一个都包含比7*7合并特征图更多的信息和特征,我想这就是为什么它看起来不太合适,尽管RCNN可以在相同的数据上很好地训练 ========================================================================== 使用完全平衡的数据,acc降至0.53(训练数据) 我认为这个网络只是猜测而不是学习 ========================================================================== 2020.06.08 我遵循GitHub中许多回购协议中使用的这种结构,但acc不会改进:Python 我自己的FastRCNN实现无法在平衡数据上运行良好,python,tensorflow,keras,faster-rcnn,Python,Tensorflow,Keras,Faster Rcnn,2020.06.09 共有700幅图像用于训练,每幅图像提取64个ROI并进行一个小批量,当批量大小设置为2时,需要花费350个步骤来完成训练,但对于RCNN,每个目标提取为单个图像,大小调整为224*224,将有64*700=44800幅图像,每一个都包含比7*7合并特征图更多的信息和特征,我想这就是为什么它看起来不太合适,尽管RCNN可以在相同的数据上很好地训练 =================================================================
def build_model():
pooled_square_size = 7
num_rois = 32
roi_input = Input(shape=(num_rois, 4), name="input_2")
model_cnn = tf.keras.applications.VGG16(
include_top=True,
weights='imagenet'
)
x = model_cnn.layers[17].output
x = RoiPoolingConv(pooled_square_size, roi_input.shape[1])([x, roi_input])
x = TimeDistributed(Flatten())(x)
x = TimeDistributed(Dense(4096, activation='selu'))(x)
x = TimeDistributed(Dropout(0.5))(x)
x = TimeDistributed(Dense(4096, activation='selu'))(x)
x = TimeDistributed(Dropout(0.5))(x)
x = TimeDistributed(Dense(2, activation='softmax', kernel_initializer='zero'))(x)
model_final = Model(inputs=[model_cnn.input, roi_input], outputs=x)
opt = Adam(lr=0.0001)
model_final.compile(
loss=tf.keras.losses.CategoricalCrossentropy(),
optimizer=opt,
metrics=["accuracy"]
)
model_final.save("TrainedModels" + slash + "FastRCNN.h5")
培训日志:
100/100 [==============================] - ETA: 0s - loss: 0.5556 - accuracy: 0.7681
Epoch 00001: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 41s 412ms/step - loss: 0.5556 - accuracy: 0.7681
Epoch 2/100
100/100 [==============================] - ETA: 0s - loss: 0.5223 - accuracy: 0.7910
Epoch 00002: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 41s 414ms/step - loss: 0.5223 - accuracy: 0.7910
Epoch 3/100
100/100 [==============================] - ETA: 0s - loss: 0.5340 - accuracy: 0.7797
Epoch 00003: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 42s 416ms/step - loss: 0.5340 - accuracy: 0.7797
Epoch 4/100
100/100 [==============================] - ETA: 0s - loss: 0.5309 - accuracy: 0.7825
Epoch 00004: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 43s 427ms/step - loss: 0.5309 - accuracy: 0.7825
Epoch 5/100
100/100 [==============================] - ETA: 0s - loss: 0.5257 - accuracy: 0.7840
Epoch 00005: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 43s 434ms/step - loss: 0.5257 - accuracy: 0.7840
Epoch 6/100
100/100 [==============================] - ETA: 0s - loss: 0.5181 - accuracy: 0.7928
Epoch 00006: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 42s 423ms/step - loss: 0.5181 - accuracy: 0.7928
Epoch 7/100
100/100 [==============================] - ETA: 0s - loss: 0.5483 - accuracy: 0.7712
Epoch 00007: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 42s 418ms/step - loss: 0.5483 - accuracy: 0.7712
Epoch 8/100
100/100 [==============================] - ETA: 0s - loss: 0.5282 - accuracy: 0.7832
Epoch 00008: saving model to TrainedModels\FastRCNN.h5
100/100 [==============================] - 43s 429ms/step - loss: 0.5282 - accuracy: 0.7832
Epoch 9/100
100/100 [==============================] - ETA: 0s - loss: 0.5385 - accuracy: 0.7765
Epoch 00009: saving model to TrainedModels\FastRCNN.h5
参考:
==========================================================================
我基于Fast RCNN为照片中的飞机检测编写了一个双分类模型,列车数据集是通过选择性搜索生成的,当我使用负/正比率约为1的数据集时,模型只能有约0.6 acc的列车数据集,当我使N/p比率更高并更接近于选择性搜索生成的原始比率时,列车acc可以达到0.9,但用于预测测试数据集时性能不佳。
在训练过程中,历元完成后,列车acc始终保持不变,当我使用张力板时,我看到历元后各层的重量没有变化:
这是我的模型的基本结构,特征提取是VGG16,并将28*28的特征映射输出到ROI池层,我尝试将激活从ReLu更改为SeLu,但没有成功:
以下是(32*14*14*512)ROI池层前后的输入图像及其特征映射(28*28*512):
我用这段代码生成了这个模型:
def build_model():
num_rois = 32
roi_input = Input(shape=(num_rois, 4), name="input_2")
model_cnn = tf.keras.applications.VGG16(
include_top=True,
weights='imagenet'
)
x = model_cnn.layers[13].output
x = RoiPoolingConv(pooled_square_size, roi_input.shape[1])([x, roi_input])
for layer in model_cnn.layers[15:]:
x = TimeDistributed(layer)(x)
x = TimeDistributed(Dense(512, activation='sigmoid'))(x)
x = TimeDistributed(Dense(2, activation='softmax'))(x)
model_final = Model(inputs=[model_cnn.input, roi_input], outputs=x)
opt = Adam(lr=0.0001)
model_final.compile(
loss=tf.keras.losses.BinaryCrossentropy(),
optimizer=opt,
metrics=["accuracy"]
)
model_final.save("TrainedModels" + slash + "FastRCNN.h5")
完整的代码可以在这里看到:
我尝试过添加BatchNormalization、调整LR,或者简单地添加更多层,但该模型一点也没有改进,我热切期待有人能告诉我该模型的关键缺陷,以便我可以对其进行进一步测试,谢谢
我高度怀疑这个VGG16有一些奇怪的地方:
这是一个输入图像:
这是其相应的输出特征映射
该死,现在我知道是什么问题了: 在ROI_Pooling.py中:
def call(self, x, mask=None):
assert (len(x) == 2)
# x[0] is image with shape (rows, cols, channels)
img = x[0]
# x[1] is roi with shape (num_rois,4) with ordering (x1,y1,x2,y2)
rois = x[1]
input_shape = img.shape
outputs = []
x1 = rois[:, :, 0]
y1 = rois[:, :, 1]
x2 = rois[:, :, 2]
y2 = rois[:, :, 3]
过去是:
def call(self, x, mask=None):
assert (len(x) == 2)
# x[0] is image with shape (rows, cols, channels)
img = x[0]
# x[1] is roi with shape (num_rois,4) with ordering (x,y,w,h)
rois = x[1]
input_shape = img.shape
outputs = []
for roi_idx in range(self.num_rois):
x1 = rois[0, roi_idx, 0]
y1 = rois[0, roi_idx, 1]
x2 = rois[0, roi_idx, 2]
y2 = rois[0, roi_idx, 3]
您可以清楚地看到,只有第一批roi用于生成结果
现在,结果有了很大改善:
该死,现在我知道是什么问题了: 在ROI_Pooling.py中:
def call(self, x, mask=None):
assert (len(x) == 2)
# x[0] is image with shape (rows, cols, channels)
img = x[0]
# x[1] is roi with shape (num_rois,4) with ordering (x1,y1,x2,y2)
rois = x[1]
input_shape = img.shape
outputs = []
x1 = rois[:, :, 0]
y1 = rois[:, :, 1]
x2 = rois[:, :, 2]
y2 = rois[:, :, 3]
过去是:
def call(self, x, mask=None):
assert (len(x) == 2)
# x[0] is image with shape (rows, cols, channels)
img = x[0]
# x[1] is roi with shape (num_rois,4) with ordering (x,y,w,h)
rois = x[1]
input_shape = img.shape
outputs = []
for roi_idx in range(self.num_rois):
x1 = rois[0, roi_idx, 0]
y1 = rois[0, roi_idx, 1]
x2 = rois[0, roi_idx, 2]
y2 = rois[0, roi_idx, 3]
您可以清楚地看到,只有第一批roi用于生成结果
现在,结果有了很大改善: