Deep learning 在Keras中，使用有状态LSTM进行小批量处理，并使用可变时间步长进行输入？_Deep Learning_Keras_Lstm_Recurrent Neural Network

Deep learning 在Keras中，使用有状态LSTM进行小批量处理，并使用可变时间步长进行输入？

deep-learning keras

Deep learning 在Keras中，使用有状态LSTM进行小批量处理，并使用可变时间步长进行输入？,deep-learning,keras,lstm,recurrent-neural-network,Deep Learning,Keras,Lstm,Recurrent Neural Network,我是Keras的新手，正在尝试实现这个网络该网络将视频帧作为x={x1，…，xT}，其中T是视频中的帧数，x是帧的视觉特征 2048号我尝试使用有状态LSTM，因为每个示例都有许多帧作为参考这是我的模型 x = Input(batch_shape=(1, None, 2048), name='x') lstmR = LSTM(256, return_sequences=True, name='lstmR', stateful=True)(x) lstmL = LSTM(256, retu

我是Keras的新手，正在尝试实现这个网络

该网络将视频帧作为x={x1，…，xT}，其中T是视频中的帧数，x是帧的视觉特征 2048号

我尝试使用有状态LSTM，因为每个示例都有许多帧作为参考

这是我的模型

x = Input(batch_shape=(1, None, 2048), name='x')
lstmR = LSTM(256, return_sequences=True, name='lstmR', stateful=True)(x)
lstmL = LSTM(256, return_sequences=True, go_backwards=True,name='lstmL', stateful=True)(x)
merge = merge([x, lstmR, lstmL], mode='concat', name='merge')
dense = Dense(256, activation='sigmoid', name='dense')(merge)
y = Dense(1, activation='sigmoid', name='y')(dense)
model = Model(input=x, output=y)
model.compile(loss='mean_squared_error',
          optimizer=SGD(lr=0.01),
          metrics=['accuracy'])

并尝试使用小批量训练模型

for epoch in range(15):
    mean_tr_acc = []
    mean_tr_loss = []
    for i in range(nb_samples):
        x, y = get_train_sample(i)
        for j in range(len(x)):
            sample_x = x[j]
            tr_loss, tr_acc = model.train_on_batch(np.expand_dims(np.expand_dims(sample_x, axis=0), axis=0),np.expand_dims(y, axis=0))
            mean_tr_acc.append(tr_acc)
            mean_tr_loss.append(tr_loss)
        model.reset_states()

但该模型似乎无法收敛，因为它的精度为0.3

我还尝试使用无状态LSTM和输入形状（无，1024）来实现这一点，但它没有收敛到太多

我认为您的LSTM无法从视频帧中提取相关特征以获得良好的精度

在处理图像（或视频帧）时，通常给出最佳结果的方法是使用卷积+relu+最大池层堆栈提取特征（参见面部表情识别综述，它们都使用卷积从图像中提取有用的特征）

这些在二维输入时效果最好，但我看到您使用大小为2048的数组而不是矩阵来表示视频帧。通常，图像的形状类似于

（行、列、颜色通道）

在您的情况下，输入将具有shape

（1，None，rows，cols，color\u channels）

，然后卷积将如下所示：

从keras.layers导入输入、LSTM、Conv2D、MaxPool2D、TimeDistributed、扁平化
x=输入（批处理形状=（1，无，行，列，颜色通道），名称='x'）
convs=TimeDistributed（Conv2D（16，内核大小=（3,3），activation='relu'，padding='same'））（x）
convs=TimeDistributed（MaxPool2D（池大小=（2,2））（convs）
convs=TimeDistributed（Conv2D（32，内核大小=（3,3），activation='relu'，padding='same'））（convs）
convs=TimeDistributed（MaxPool2D（池大小=（2,2））（convs）
lstm_输入=时间分布（展平（））（convs）
lstmR=LSTM（256，返回序列=True，name='lstmR'，stateful=True）（lstmu输入）
lstmL=LSTM（256，返回序列=True，向后走=True，name='lstmL'，stateful=True）（LSTM\u输入）
...

其中

timedistributed

将给定层应用于每个时间步