Python 在使用Keras训练CNN-LSTM时卡在第一个时代_Python_Tensorflow_Keras_Classification_Google Colaboratory

Python 在使用Keras训练CNN-LSTM时卡在第一个时代

python tensorflow keras google-colaboratory

Python 在使用Keras训练CNN-LSTM时卡在第一个时代,python,tensorflow,keras,classification,google-colaboratory,Python,Tensorflow,Keras,Classification,Google Colaboratory,我正在使用Keras构建一个CNN-LSTM推特分类模型。该模型有两个输入，任务为三类分类。我用于构建模型的代码如下所示： def conv2d_lstm_with_author(): # Get the input information - author & tweet author_repre_input = Input(shape=(100,), name='author_input') tweet_input = Input(shape=(13, 10

我正在使用Keras构建一个CNN-LSTM推特分类模型。该模型有两个输入，任务为三类分类。我用于构建模型的代码如下所示：

def conv2d_lstm_with_author():

    # Get the input information - author & tweet
    author_repre_input = Input(shape=(100,), name='author_input')
    tweet_input = Input(shape=(13, 100, 1), name='tweet_input')

    # Create the convolutional layer and lstm layer
    conv2d = Conv2D(filters = 200, kernel_size = (2, 100), padding='same', activation='relu', 
                    use_bias=True, name='conv_1')(tweet_input)
    flat = Flatten(name='flatten_1')(conv2d)
    reshape_flat = Reshape((260000, 1), name='reshape_1')(flat)
    lstm = LSTM(100, return_state=False, activation='tanh', recurrent_activation='hard_sigmoid', name='lstm_1')(reshape_flat)
    concatenate_layer = concatenate([lstm, author_repre_input], axis=1, name='concat_1')
    dense_1 = Dense(10, activation='relu', name='dense_1')(concatenate_layer)
    output = Dense(3, activation='softmax', kernel_regularizer=regularizers.l2(0.01), name='output_dense')(dense_1)

    # Build the model
    model = Model(inputs=[author_repre_input, tweet_input], outputs=output)
    return model

model = conv2d_lstm_with_author()
model.summary()

optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

我的两个输入和标签的形状是：

作者报告输入：（40942100）
tweet_输入：（40942,13100,1）
我的标签列车Y：（40942,3）

模型摘要的快照为：

当我使用以下代码来训练数据时：

model.fit([author_repre_input, tweet_input], [Train_Y], epochs=20, batch_size=32, validation_split=0.2, 
          shuffle=False, verbose=2)

结果在第一个历元中一直不稳定，并且日志没有显示任何有用的内容，只是：

纪元1/20

我想知道为什么会发生这种情况。我使用的tensorflow和keras版本是：

tensorflow-1.14.0
keras-2.2.0

非常感谢您抽出时间

1月20日更新

我尝试使用谷歌Colab来训练模型。我在运行模型时检查RAM。Colab为我分配了25克RAM。然而，经过几秒钟的训练后，由于占用了所有可用的内存，会话崩溃了

我认为模型部分一定有问题……任何建议和见解都将不胜感激

幸运的是，你没有陷入困境

问题的根源在于，在您的

model.fit

中，您指定了参数

verbose=2

这意味着您的代码只会在一个时代结束时输出消息，而不会在培训过程中输出信息性消息

要解决您的问题并查看培训进度，请设置

verbose=1

幸运的是，您没有陷入困境

问题的根源在于，在您的

model.fit

中，您指定了参数

verbose=2

这意味着您的代码只会在一个时代结束时输出消息，而不会在培训过程中输出信息性消息

要解决您的问题并查看培训进度，请设置

verbose=1

我想我已经找到了答案

问题在于卷积层。内核大小太小，导致输出层的维数太高。为了解决这个问题，我将内核大小从

（2100）

更改为

（3100）

。此外，我还向我的模型中添加了辍学者。我现在使用的模型总结如下：

def conv2d_lstm_with_author():

    # Get the input information - author & tweet
    author_repre_input = Input(shape=(100,), name='author_input')
    tweet_input = Input(shape=(13, 100, 1), name='tweet_input')

    # Create the convolutional layer and lstm layer
    conv2d = Conv2D(filters = 200, kernel_size = (2, 100), padding='same', activation='relu', 
                    use_bias=True, name='conv_1')(tweet_input)
    flat = Flatten(name='flatten_1')(conv2d)
    reshape_flat = Reshape((260000, 1), name='reshape_1')(flat)
    lstm = LSTM(100, return_state=False, activation='tanh', recurrent_activation='hard_sigmoid', name='lstm_1')(reshape_flat)
    concatenate_layer = concatenate([lstm, author_repre_input], axis=1, name='concat_1')
    dense_1 = Dense(10, activation='relu', name='dense_1')(concatenate_layer)
    output = Dense(3, activation='softmax', kernel_regularizer=regularizers.l2(0.01), name='output_dense')(dense_1)

    # Build the model
    model = Model(inputs=[author_repre_input, tweet_input], outputs=output)
    return model

model = conv2d_lstm_with_author()
model.summary()

optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

现在，该模型可以在Google Colab中顺利运行

因此，我认为如果出现类似问题，请检查每层的输出尺寸。如果模型产生非常高的维度输出，Keras API可能会在训练阶段停止。

我想我已经找到了答案

问题在于卷积层。内核大小太小，导致输出层的维数太高。为了解决这个问题，我将内核大小从

（2100）

更改为

（3100）

。此外，我还向我的模型中添加了辍学者。我现在使用的模型总结如下：

def conv2d_lstm_with_author():

    # Get the input information - author & tweet
    author_repre_input = Input(shape=(100,), name='author_input')
    tweet_input = Input(shape=(13, 100, 1), name='tweet_input')

    # Create the convolutional layer and lstm layer
    conv2d = Conv2D(filters = 200, kernel_size = (2, 100), padding='same', activation='relu', 
                    use_bias=True, name='conv_1')(tweet_input)
    flat = Flatten(name='flatten_1')(conv2d)
    reshape_flat = Reshape((260000, 1), name='reshape_1')(flat)
    lstm = LSTM(100, return_state=False, activation='tanh', recurrent_activation='hard_sigmoid', name='lstm_1')(reshape_flat)
    concatenate_layer = concatenate([lstm, author_repre_input], axis=1, name='concat_1')
    dense_1 = Dense(10, activation='relu', name='dense_1')(concatenate_layer)
    output = Dense(3, activation='softmax', kernel_regularizer=regularizers.l2(0.01), name='output_dense')(dense_1)

    # Build the model
    model = Model(inputs=[author_repre_input, tweet_input], outputs=output)
    return model

model = conv2d_lstm_with_author()
model.summary()

optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

现在，该模型可以在Google Colab中顺利运行

因此，我认为如果出现类似问题，请检查每层的输出尺寸。如果模型产生非常高的维输出，Keras API可能会在培训期间停止。

哇，感谢您的及时回复！但是当我将verbose设置更改为

verbose=1

时，我只能看到：

32753个样本上的Shell序列，8189个样本上的验证1/20

好的，这仍然意味着您的问题已经解决。你需要等一等训练开始。也许。。。我没有任何GPU…但是谢谢你的回答我不知道这是否有帮助，即使你使用Google Colab，只要会话崩溃，你也可以将RAM增加到32GB。因此，一旦您扩展RAM，它可能会有所帮助。请验证您是否使用GPU或TPU，因为GPU比Google ColabWow中的TPU更快，谢谢您的及时回复！但是当我将verbose设置更改为

verbose=1

时，我只能看到：

32753个样本上的Shell序列，8189个样本上的验证1/20

好的，这仍然意味着您的问题已经解决。你需要等一等训练开始。也许。。。我没有任何GPU…但是谢谢你的回答我不知道这是否有帮助，即使你使用Google Colab，只要会话崩溃，你也可以将RAM增加到32GB。因此，一旦您扩展RAM，它可能会有所帮助。请验证您是否使用GPU或TPU，因为GPU比Google Colab中的TPU更快。根据堆栈溢出规则，您最好针对此问题单独提问。根据堆栈溢出规则，您最好针对此问题单独提问。如果是这样，那么，如何在Keras:s上训练目标检测？如果这是真的，那么如何在Keras:s上训练目标检测？