Python 基于多类分类的CNN模型过拟合_Python_Keras_Deep Learning_Conv Neural Network_Recurrent Neural Network

Python 基于多类分类的CNN模型过拟合

python keras deep-learning

Python 基于多类分类的CNN模型过拟合,python,keras,deep-learning,conv-neural-network,recurrent-neural-network,Python,Keras,Deep Learning,Conv Neural Network,Recurrent Neural Network,我正在尝试使用手套嵌入来训练一个基于（也是一个rnn的）cnn模型。数据集是一个标签数据：文本（tweets）和标签（仇恨、冒犯或两者都没有）问题在于，模型在训练集上表现良好，但在验证集上表现不佳模型如下： kernel_size = 2 filters = 256 pool_size = 2 gru_node = 64 model = Sequential() model.add(Embedding(len(word_index) + 1,

我正在尝试使用手套嵌入来训练一个基于（也是一个rnn的）cnn模型。数据集是一个标签数据：文本（tweets）和标签（仇恨、冒犯或两者都没有）

问题在于，模型在训练集上表现良好，但在验证集上表现不佳

模型如下：

kernel_size = 2
filters = 256
pool_size = 2
gru_node = 64
model = Sequential()
model.add(Embedding(len(word_index) + 1,
                            EMBEDDING_DIM,
                            weights=[embedding_matrix],
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=True))
model.add(Dropout(0.25))
model.add(Conv1D(filters, kernel_size, activation='relu'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(Conv1D(filters, kernel_size, activation='softmax'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, recurrent_dropout=0.2))
model.add(Dense(1024,activation='relu'))
model.add(Dense(nclasses))
model.add(Activation('softmax'))

model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

X = df.tweet
y = df['classifi']    # classes 0,1,2

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, shuffle=False)
X_train_Glove,X_test_Glove, word_index,embeddings_index = loadData_Tokenizer(X_train,X_test)

model_RCNN = Build_Model_RCNN_Text(word_index,embeddings_index, 20)

model_RCNN.fit(X_train_Glove, y_train,validation_data=(X_test_Glove, y_test),
                              epochs=15,batch_size=128,verbose=2)

predicted = model_RCNN.predict(X_test_Glove)

predicted = np.argmax(predicted, axis=1)
print(metrics.classification_report(y_test, predicted))

拟合模型：

kernel_size = 2
filters = 256
pool_size = 2
gru_node = 64
model = Sequential()
model.add(Embedding(len(word_index) + 1,
                            EMBEDDING_DIM,
                            weights=[embedding_matrix],
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=True))
model.add(Dropout(0.25))
model.add(Conv1D(filters, kernel_size, activation='relu'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(Conv1D(filters, kernel_size, activation='softmax'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, recurrent_dropout=0.2))
model.add(Dense(1024,activation='relu'))
model.add(Dense(nclasses))
model.add(Activation('softmax'))

model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

X = df.tweet
y = df['classifi']    # classes 0,1,2

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, shuffle=False)
X_train_Glove,X_test_Glove, word_index,embeddings_index = loadData_Tokenizer(X_train,X_test)

model_RCNN = Build_Model_RCNN_Text(word_index,embeddings_index, 20)

model_RCNN.fit(X_train_Glove, y_train,validation_data=(X_test_Glove, y_test),
                              epochs=15,batch_size=128,verbose=2)

predicted = model_RCNN.predict(X_test_Glove)

predicted = np.argmax(predicted, axis=1)
print(metrics.classification_report(y_test, predicted))

这就是分布的样子（0:仇恨，1:攻击，2:都不是）

模型摘要

结果:

分类报告

这是正确的方法还是我在这里遗漏了什么

一般来说，有两个方面可以解决过度装配的问题：

改进数据

更独特的数据
过采样（平衡数据）

限制网络结构

辍学（你已经实现了这一点）
更少的参数（您可能希望针对更小的网络进行基准测试）
正则化（例如L1和L2）

我建议尝试使用更少的参数（因为这很快）和过采样（因为您的数据似乎不平衡）

此外，还可以尝试超参数拟合。制作大量具有不同参数的网络，而不是选择最佳网络

注意：如果您进行超参数拟合，请确保有一个额外的验证集，因为这样很容易过度拟合测试集

旁注：有时在对NN进行故障排除时，将优化器设置为基本随机梯度下降是有帮助的。它使训练速度减慢了很多，但使进展更加清晰

祝你好运

第一层有1M的参数。我不知道这是否是故意的，但它看起来很大