Tensorflow Keras中目标检测模型训练中的不相容张量问题_Tensorflow_Machine Learning_Keras_Deep Learning_Object Detection

Tensorflow Keras中目标检测模型训练中的不相容张量问题

tensorflow machine-learning keras deep-learning

Tensorflow Keras中目标检测模型训练中的不相容张量问题,tensorflow,machine-learning,keras,deep-learning,object-detection,Tensorflow,Machine Learning,Keras,Deep Learning,Object Detection,我试图将基本分类模型（）扩展为单个对象的简单对象检测模型分类模型只是对图像中的手写数字进行分类，其中数字填充了大部分图像。为了为对象检测创建一个有意义的数据集，我使用MNIST数据集作为基础，并通过以下步骤将其转换为一个新的数据集将图像画布大小从28x28增加到100x100 将手写数字移动到100x100图像中的任意位置创建地面真实边界框图1：步骤1和2的图示图2：一些生成的地面真实边界框模型的输出向量受YOLO定义的启发，但针对单个对象： y = [p, x, y, w, h

我试图将基本分类模型（）扩展为单个对象的简单对象检测模型

分类模型只是对图像中的手写数字进行分类，其中数字填充了大部分图像。为了为对象检测创建一个有意义的数据集，我使用MNIST数据集作为基础，并通过以下步骤将其转换为一个新的数据集

将图像画布大小从28x28增加到100x100

将手写数字移动到100x100图像中的任意位置

创建地面真实边界框

图1：步骤1和2的图示

图2：一些生成的地面真实边界框

模型的输出向量受YOLO定义的启发，但针对单个对象：

y = [p, x, y, w, h, c0, ..., c9]

其中p=物体的概率，（x，y，w，h）=包围盒中心、宽度和高度作为图像大小的分数，c0-c9=类别概率（每个数字一个）

因此，要将分类模型更改为对象检测模型，我只需将最后一个softmax层替换为具有15个节点的完全连接层（每个节点对应于

中的每个值），并编写一个自定义损失函数，该函数可以将预测值与地面真实值进行比较

然而，当我尝试训练模型时，我得到了一个神秘的错误

tensorflow.python.framework.errors\u impl.invalidargumeinterror:不兼容的形状：[15]vs.[200]

，其中

[15]

是我最后一层中的节点数，

[200]

是我为训练指定的批量大小（我通过更改值并再次运行来验证这一点）。它们不一定必须是相同的，所以我想我错过了模型中张量维度的一些重要内容，但我无法弄清楚是什么

注：我对批处理的理解是模型在训练期间一次处理多少个样本（图像）。因此，批处理大小应该是训练数据大小的偶数部分是合理的。但没有任何东西可以将其连接到模型中的输出节点数

感谢您的帮助

以下是完整的代码：

import numpy as np

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras import backend as K


def increase_image_size(im_set, new_size):
    num_images = im_set.shape[0]
    orig_size = im_set[0].shape[0]
    im_stack = np.zeros((num_images, new_size, new_size), dtype='uint8')

    # Put MNIST digits at random positions in new images
    for i in range(num_images):
        x0 = int(np.random.random() * (new_size - orig_size - 1))
        y0 = int(np.random.random() * (new_size - orig_size - 1))
        x1 = x0 + orig_size
        y1 = y0 + orig_size

        im_stack[i, y0:y1, x0:x1] = im_set[i]

    return im_stack


# Get bounding box annotations from images and object labels
def get_image_annotations(X_train, y_train):
    num_images = len(X_train)
    annotations = np.zeros((num_images, 15), dtype='float')
    for i in range(num_images):
        annotations[i] = get_image_annotation(X_train[i], y_train[i])
    return annotations


def get_image_annotation(X, y):
    sz_y, sz_x = X.shape

    y_indices, x_indices = np.where(X > 0)

    y_min = max(np.min(y_indices) - 1, 0)
    y_max = min(np.max(y_indices) + 1, sz_y)
    x_min = max(np.min(x_indices) - 1, 0)
    x_max = min(np.max(x_indices) + 1, sz_x)

    bb_x = (x_min + x_max) / 2.0 / sz_x
    bb_y = (y_min + y_max) / 2.0 / sz_y

    bb_w = (x_max - x_min) / sz_x
    bb_h = (y_max - y_min) / sz_y

    classes = np.zeros(10, dtype='float')
    classes[y] = 1

    output = np.concatenate(([1, bb_x, bb_y, bb_w, bb_h], classes))
    return output


def custom_cost_function(y_true, y_pred):
    p_p = y_pred[0]
    x_p = y_pred[1]
    y_p = y_pred[2]
    w_p = y_pred[3]
    h_p = y_pred[4]

    p_t = y_true[0]
    x_t = y_true[1]
    y_t = y_true[2]
    w_t = y_true[3]
    h_t = y_true[4]

    c_pred = y_pred[5:]
    c_true = y_true[5:]

    c1 = K.sum((c_pred - c_true) * (c_pred - c_true))
    c2 = (x_p - x_t) * (x_p - x_t) + (y_p - y_t) * (y_p - y_t) \
         + (K.sqrt(w_p) - K.sqrt(w_t)) * (K.sqrt(w_p) - K.sqrt(w_t)) \
         + (K.sqrt(h_p) - K.sqrt(h_t)) * (K.sqrt(h_p) - K.sqrt(h_t))

    lambda_class = 1.0
    lambda_coord = 1.0

    return lambda_class * c1 + lambda_coord * c2


def baseline_model():
    # create model
    model = Sequential()
    model.add(Conv2D(32, (5, 5), input_shape=(1, 100, 100), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(15, activation='linear'))
    # Compile model
    model.compile(loss=custom_cost_function, optimizer='adam', metrics=['accuracy'])
    return model


def mnist_object_detection():
    K.set_image_dim_ordering('th')

    # fix random seed for reproducibility
    np.random.seed(7)

    # Load data
    print("Loading data")
    (X_train, y_train), (X_test, y_test) = mnist.load_data()

    # Adjust input images
    print("Adjust input images (increasing image sizes and moving digits)")
    X_train = increase_image_size(X_train, 100)
    X_test = increase_image_size(X_test, 100)

    print("Creating annotations")
    y_train_prim = get_image_annotations(X_train, y_train)
    y_test_prim = get_image_annotations(X_test, y_test)
    print("...done")

    # reshape to be [samples][pixels][width][height]
    X_train = X_train.reshape(X_train.shape[0], 1, 100, 100).astype('float32')
    X_test = X_test.reshape(X_test.shape[0], 1, 100, 100).astype('float32')

    # normalize inputs from 0-255 to 0-1
    X_train = X_train / 255
    X_test = X_test / 255

    # build the model
    print("Building model")
    model = baseline_model()
    # Fit the model
    print("Training model")
    model.fit(X_train, y_train_prim, validation_data=(X_test, y_test_prim), epochs=10, batch_size=200, verbose=1)


if __name__ == '__main__':
    mnist_object_detection()

当我运行它时，我得到一个错误：

/Users/gedda/anaconda3/envs/keras-obj-det/bin/pythonn /Users/gedda/devel/tensorflow/digit-recognition/object_detection_reduced.py
Using TensorFlow backend.
Loading data
Adjust input images (increasing image sizes and moving digits)
Creating annotations
...done
Building model
2018-11-30 13:26:34.030159: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-11-30 13:26:34.030463: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance.
Training model
Train on 60000 samples, validate on 10000 samples
Epoch 1/3
Traceback (most recent call last):
  File "/Users/gedda/devel/tensorflow/digit-recognition/object_detection_reduced.py", line 140, in <module>
    mnist_object_detection()
  File "/Users/gedda/devel/tensorflow/digit-recognition/object_detection_reduced.py", line 136, in mnist_object_detection
    model.fit(X_train, y_train_prim, validation_data=(X_test, y_test_prim), epochs=3, batch_size=200, verbose=1)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [15] vs. [200]
     [[{{node training/Adam/gradients/loss/dense_2_loss/mul_7_grad/BroadcastGradientArgs}} = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@training/Adam/gradients/loss/dense_2_loss/mul_7_grad/Reshape"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](training/Adam/gradients/loss/dense_2_loss/mul_7_grad/Shape, training/Adam/gradients/loss/dense_2_loss/mul_7_grad/Shape_1)]]

Process finished with exit code 1

/Users/gedda/anaconda3/envs/keras obj det/bin/pythonn/Users/gedda/devel/tensorflow/digital recognition/object\u detection\u reduced.py
使用TensorFlow后端。
加载数据
调整输入图像（增加图像大小和移动数字）
创建注释
…完成
建筑模型
2018-11-30 13:26:34.030159:I tensorflow/core/platform/cpu_feature_guard.cc:141]您的cpu支持该tensorflow二进制文件未编译为使用的指令：SSE4.1 SSE4.2 AVX
2018-11-30 13:26:34.030463:I tensorflow/core/common_runtime/process_util.cc:69]使用默认的inter-op设置创建新线程池：8.使用inter-op_并行性_线程进行优化以获得最佳性能。
培训模式
培训60000个样本，验证10000个样本
纪元1/3
回溯（最近一次呼叫最后一次）：
文件“/Users/gedda/devel/tensorflow/digital recognition/object\u detection\u reduced.py”，第140行，在
mnist_对象_检测（）
mnist\u object\u detection中的文件“/Users/gedda/devel/tensorflow/digital recognition/object\u detection\u reduced.py”，第136行
模型拟合（X_序列，y_序列，验证数据=（X_测试，y_测试，y_序列），年代=3，批次大小=200，详细度=1）
文件“/Users/gedda/anaconda3/envs/keras obj det/lib/python3.6/site packages/keras/engine/training.py”，第1039行
验证步骤=验证步骤）
文件“/Users/gedda/anaconda3/envs/keras obj det/lib/python3.6/site packages/keras/engine/training_arrays.py”，第199行，在fit_循环中
outs=f（ins\U批量）
文件“/Users/gedda/anaconda3/envs/keras obj det/lib/python3.6/site packages/keras/backend/tensorflow_backend.py”，第2715行，在调用中__
返回自调用（输入）
文件“/Users/gedda/anaconda3/envs/keras obj det/lib/python3.6/site packages/keras/backend/tensorflow_backend.py”，第2675行，在调用中
fetched=self.\u可调用\u fn（*array\u vals）
文件“/Users/gedda/anaconda3/envs/keras obj det/lib/python3.6/site packages/tensorflow/python/client/session.py”，第1439行，在__
运行_元数据_ptr）
文件“/Users/gedda/anaconda3/envs/keras obj det/lib/python3.6/site packages/tensorflow/python/framework/errors_impl.py”，第528行，在退出时__
c_api.TF_GetCode（self.status.status））
tensorflow.python.framework.errors\u impl.InvalidArgumentError:不兼容的形状：[15]与[200]
[{node training/Adam/gradients/loss/densite_2_loss/mul_7_grad/BroadcastGradientArgs}}=BroadcastGradientArgs[T=DT_INT32，_class=[“loc:@training/Adam/gradients/loss/densite_2_loss/mul_7_grad/reformate”]，_device=“/job:localhost/replica:0/task:0/device:CPU:0”]（训练/Adam/梯度/损失/密集2级损失/多重7级/形状，训练/Adam/梯度/损失/密集2级损失/多重7级/形状1）]]
进程已完成，退出代码为1

所有张量的第一个维度是批量大小

你的损失可能在第二维度起作用：

def custom_cost_function(y_true, y_pred):
    p_p = y_pred[:,0]
    x_p = y_pred[:,1]
    y_p = y_pred[:,2]
    w_p = y_pred[:,3]
    h_p = y_pred[:,4]

    p_t = y_true[:,0]
    x_t = y_true[:,1]
    y_t = y_true[:,2]
    w_t = y_true[:,3]
    h_t = y_true[:,4]

    c_pred = y_pred[:,5:]
    c_true = y_true[:,5:]

    ........

谢谢！现在它成功了。我从来没有想到这就是问题所在。