Tensorflow 用LSTM代替全连通层的卷积神经网络

Tensorflow 用LSTM代替全连通层的卷积神经网络,tensorflow,deep-learning,conv-neural-network,lstm,Tensorflow,Deep Learning,Conv Neural Network,Lstm,我正在尝试为棒球建立一个神经网络,当一个球穿过板时,它可以检测球在哪里以及击球区在哪里,但我的神经网络似乎陷入了局部极小值,它为数据集中的每个项目返回相同的值 我的方法是在球接近篮板时拍摄34帧,并使用该帧来检测球何时越过篮板 输出为球左、球上、球宽、打击区左、打击区上、打击区宽、打击区高、球穿过板的帧 我的神经网络模型是对每一帧进行卷积,但不是在最后使用一个完全连接的层,而是使用一个LSTM,这样神经网络就可以从以前的帧中推断出事情。我需要从之前的画面中推断,因为有时球不可见,因为投手在它前面

我正在尝试为棒球建立一个神经网络,当一个球穿过板时,它可以检测球在哪里以及击球区在哪里,但我的神经网络似乎陷入了局部极小值,它为数据集中的每个项目返回相同的值

我的方法是在球接近篮板时拍摄34帧,并使用该帧来检测球何时越过篮板

输出为球左、球上、球宽、打击区左、打击区上、打击区宽、打击区高、球穿过板的帧

我的神经网络模型是对每一帧进行卷积,但不是在最后使用一个完全连接的层,而是使用一个LSTM,这样神经网络就可以从以前的帧中推断出事情。我需要从之前的画面中推断,因为有时球不可见,因为投手在它前面,或者因为它在捕手手套里

从一段时间内的成本来看,神经网络似乎陷入了局部极小值,并对训练集中的每个音调产生相同的结果

这是我的密码

    filter_size1 = 5  # Convolution filters are 5 x 5 pixels.
    num_filters1 = 16  # There are 16 of these filters.

    filter_size2 = 5  # Convolution filters are 5 x 5 pixels.
    num_filters2 = 36  # There are 36 of these filters.

    filter_size3 = 5  # Convolution filters are 5 x 5 pixels.
    num_filters3 = 36  # There are 36 of these filters.

    num_hidden = 256
    lstm_layers = 2

    num_channels = 1

    num_classes = 10

    width = 320
    height = 180

    sequence_length = Directories.Pitch_Sequence_Length

    x = tf.placeholder(tf.float32, shape=[None, sequence_length, width, height], name='x')

    keep_prob = tf.placeholder(tf.float32, name="keep_prob")

    x_image = tf.reshape(x, [-1, width, height, num_channels])

    y_true = tf.placeholder(tf.float32, shape=[None, num_classes], name='y_true')

    layer_conv1, weights_conv1, biases_conv1 = ConvolutionNeuralNetwork.new_conv_layer(input=x_image,
                                                                         num_input_channels=num_channels,
                                                                         filter_size=filter_size1,
                                                                         num_filters=num_filters1,
                                                                         use_pooling=True)

    layer_conv2, weights_conv2, biases_conv2 = ConvolutionNeuralNetwork.new_conv_layer(input=layer_conv1, num_input_channels=num_filters1,
                                                filter_size=filter_size2, num_filters=num_filters2,
                                                use_pooling=True)

    layer_conv3, weights_conv3, biases_conv3 = ConvolutionNeuralNetwork.new_conv_layer(input=layer_conv2, num_input_channels=num_filters2,
                                                filter_size=filter_size3, num_filters=num_filters3,
                                                use_pooling=True)

    layer_flat, num_features = ConvolutionNeuralNetwork.flatten_layer(layer_conv3)

    fc_sequence = tf.reshape(layer_flat, [-1, sequence_length, int(layer_flat.shape[1])])

    cell = tf.contrib.rnn.BasicLSTMCell(num_hidden)

    cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=keep_prob)

    cell = tf.contrib.rnn.MultiRNNCell([cell] * lstm_layers)

    outputs, states = tf.contrib.rnn.static_rnn(cell, tf.unstack(tf.transpose(fc_sequence, perm=[1, 0, 2])), dtype=tf.float32)

    self.y_pred = ConvolutionNeuralNetwork.new_fc_layer(outputs[-1], num_hidden, num_classes, use_relu=True)

    self.cost = tf.reduce_mean(tf.pow(self.y_pred - y_true, 2))

    self.optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(self.cost)
对于卷积神经网络.py类

    import tensorflow as tf

    def new_weights(shape):
        return tf.Variable(tf.truncated_normal(shape, stddev=0.05))


    def new_biases(length):
        return tf.Variable(tf.constant(0.05, shape=[length]))


    def new_conv_layer(input, num_input_channels, filter_size, num_filters, use_pooling=True, weights=None, biases=None):
        shape = [filter_size, filter_size, num_input_channels, num_filters]

        if weights is None:
            weights = new_weights(shape=shape)

        if biases is None:
            biases = new_biases(length=num_filters)

        layer = tf.nn.conv2d(input=input, filter=weights, strides=[1, 1, 1, 1], padding='SAME')

        layer += biases

        if use_pooling:
            layer = tf.nn.max_pool(value=layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

        layer = tf.nn.relu(layer)

        return layer, weights, biases

    def flatten_layer(layer):
        layer_shape = layer.get_shape()

        num_features = layer_shape[1:4].num_elements()

        layer_flat = tf.reshape(layer, [-1, num_features])

        return layer_flat, num_features


    def flatten_layer_multiple(layer1, layer2):
        layer = tf.concat([layer1, layer2], 1)
        layer_shape = layer.get_shape()

        num_features = layer_shape[1:4].num_elements()

        layer_flat = tf.reshape(layer, [-1, num_features])

        return layer_flat, num_features


    def new_fc_layer(input, num_inputs, num_outputs, use_relu=True, keep_prob=None):
        weights = new_weights(shape=[num_inputs, num_outputs])
        biases = new_biases(length=num_outputs)

        layer = tf.matmul(input, weights) + biases

        if use_relu:
            layer = tf.nn.relu(layer)
            if keep_prob is not None:
                tf.nn.dropout(layer, keep_prob)

        return layer
以下是我的学习曲线(随时间推移的成本)

这是一个假设结果的例子

这是神经网络预测的结果


神经网络为打击区绘制框,为每个球在完全相同的位置的位置绘制框。有人知道我做错了什么吗?

你真的需要在最后一个fc层上使用relu吗?嗯,我可以试着摆脱它。谢谢。你真的需要在最后一个fc层上使用relu吗?嗯,我可以试着去掉它。谢谢