Python 如何将NLP融入CNN模式?
我正在研究将CNN机器学习模型与NLP(多标签分类)结合使用 我读过一些文章,其中提到在应用CNN进行多标签分类时取得了良好的效果 我试图在Python上测试这个模型 我读了很多关于如何使用NLP和神经网络的文章 我有一段代码不起作用,并且给了我很多错误(每次我修复错误时,我都会得到另一个错误) 我不再寻找付费的自由职业者来帮助我修复代码,我雇佣了5个人,但没有一个人能够修复代码 你是我最后的希望 我希望有人能帮助我修复此代码并使其正常工作 首先,这是我的数据集(100条记录样本,只是为了确保代码正常工作,我知道它的准确性不高。我稍后会调整和增强模型) 目前我只想让这段代码正常工作。然而,关于如何提高准确性的建议确实受到欢迎 我犯的一些错误Python 如何将NLP融入CNN模式?,python,keras,nlp,cnn,Python,Keras,Nlp,Cnn,我正在研究将CNN机器学习模型与NLP(多标签分类)结合使用 我读过一些文章,其中提到在应用CNN进行多标签分类时取得了良好的效果 我试图在Python上测试这个模型 我读了很多关于如何使用NLP和神经网络的文章 我有一段代码不起作用,并且给了我很多错误(每次我修复错误时,我都会得到另一个错误) 我不再寻找付费的自由职业者来帮助我修复代码,我雇佣了5个人,但没有一个人能够修复代码 你是我最后的希望 我希望有人能帮助我修复此代码并使其正常工作 首先,这是我的数据集(100条记录样本,只是为了确保代
InvalidArgumentError: indices[1] = [0,13] is out of order. Many sparse ops require sorted indices.
Use `tf.sparse.reorder` to create a correctly ordered copy.
及
这是我的密码
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MultiLabelBinarizer
from keras.layers import *
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam
from keras.models import *
# Load Dataset
df_text = pd.read_csv("J:\\__DataSets\\__Samples\\Test\\data100\\text100.csv")
df_results = pd.read_csv("J:\\__DataSets\\__Samples\\Test\\data100\\results100.csv")
df = pd.merge(df_text,df_results, on="ID")
#Prepare multi-label
Labels = []
for i in df['Code']:
Labels.append(i.split(","))
df['Labels'] = Labels
multilabel_binarizer = MultiLabelBinarizer()
multilabel_binarizer.fit(df['Labels'])
y = multilabel_binarizer.transform(df['Labels'])
X = df['Text'].values
#TF-IDF
tfidf_vectorizer = TfidfVectorizer(max_df=0.8, max_features=1000)
xtrain, xval, ytrain, yval = train_test_split(X, y, test_size=0.2, random_state=9)
tfidf_vectorizer = TfidfVectorizer(max_df=0.8, max_features=1000)
# create TF-IDF features
X_train_count = tfidf_vectorizer.fit_transform(xtrain)
X_test_count = tfidf_vectorizer.transform(xval)
#Prepare Model
input_dim = X_train_count.shape[1] # Number of features
output_dim=len(df['Labels'].explode().unique())
sequence_length = input_dim
vocabulary_size = X_train_count.shape[0]
embedding_dim = output_dim
filter_sizes = [3,4,5]
num_filters = 512
drop = 0.5
epochs = 100
batch_size = 30
#CNN Model
inputs = Input(shape=(sequence_length,), dtype='int32')
embedding = Embedding(input_dim=vocabulary_size, output_dim=embedding_dim, input_length=sequence_length)(inputs)
reshape = Reshape((sequence_length,embedding_dim,1))(embedding)
conv_0 = Conv2D(num_filters, kernel_size=(filter_sizes[0], embedding_dim), padding='valid', kernel_initializer='normal', activation='relu')(reshape)
conv_1 = Conv2D(num_filters, kernel_size=(filter_sizes[1], embedding_dim), padding='valid', kernel_initializer='normal', activation='relu')(reshape)
conv_2 = Conv2D(num_filters, kernel_size=(filter_sizes[2], embedding_dim), padding='valid', kernel_initializer='normal', activation='relu')(reshape)
maxpool_0 = MaxPool2D(pool_size=(sequence_length - filter_sizes[0] + 1, 1), strides=(1,1), padding='valid')(conv_0)
maxpool_1 = MaxPool2D(pool_size=(sequence_length - filter_sizes[1] + 1, 1), strides=(1,1), padding='valid')(conv_1)
maxpool_2 = MaxPool2D(pool_size=(sequence_length - filter_sizes[2] + 1, 1), strides=(1,1), padding='valid')(conv_2)
concatenated_tensor = Concatenate(axis=1)([maxpool_0, maxpool_1, maxpool_2])
flatten = Flatten()(concatenated_tensor)
dropout = Dropout(drop)(flatten)
output = Dense(units=2, activation='softmax')(dropout)
# this creates a model that includes
model = Model(inputs=inputs, outputs=output)
#Compile
checkpoint = ModelCheckpoint('weights.{epoch:03d}-{val_acc:.4f}.hdf5', monitor='val_acc', verbose=1, save_best_only=True, mode='auto')
adam = Adam(lr=1e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(optimizer=adam, loss='binary_crossentropy', metrics=['accuracy'])
print("Traning Model...")
model.summary()
#Fit
model.fit(X_train_count, ytrain, batch_size=batch_size, epochs=epochs, verbose=1, callbacks=[checkpoint], validation_data=(X_test_count, yval)) # starts training
#Accuracy
loss, accuracy = model.evaluate(X_train_count, ytrain, verbose=False)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test_count, yval, verbose=False)
print("Testing Accuracy: {:.4f}".format(accuracy))
我的数据集示例
text100.csv
ID Text
1 Allergies to Drugs Attending:[**First Name3 (LF) 1**] Chief Complaint: headache and neck stiffne
2 Complaint: fever, chills, rigors Major Surgical or Invasive Procedure: Arterial l
3 Complaint: Febrile, unresponsive--> GBS meningitis and bacteremia Major Surgi
4 Allergies to Drugs Attending:[**First Name3 (LF) 45**] Chief Complaint: PEA arrest . Major Sur
5 Admitted to an outside hospital with chest pain and ruled in for myocardial infarction. She was tr
6 Known Allergies to Drugs Attending:[**First Name3 (LF) 78**] Chief Complaint: Progressive lethargy
7 Complaint: hypernatremia, unresponsiveness Major Surgical or Invasive Procedure: PEG/tra
8 Chief Complaint: cough, SOB Major Surgical or Invasive Procedure: RIJ placed Hemod
结果100.csv
ID Code
1 A32,D50,G00,I50,I82,K51,M85,R09,R18,T82,Z51
2 418,475,905,921,A41,C50,D70,E86,F32,F41,J18,R11,R50,Z00,Z51,Z93,Z95
3 136,304,320,418,475,921,998,A40,B37,G00,G35,I10,J15,J38,J69,L27,L89,T81,T85
4 D64,D69,E87,I10,I44,N17
5 E11,I10,I21,I25,I47
6 905,C61,C91,E87,G91,I60,M47,M79,R50,S43
7 304,320,355,E11,E86,E87,F06,I10,I50,I63,I69,J15,J69,L89,L97,M81,N17,Z91
目前我没有任何具体的补充,但我发现以下两种调试策略对我很有用:
是否可以提供数据集的输出?下载一些随机的zip文件并不十分有趣。帮助你会容易得多。@Victormarica更新了我的问题好的,我可以重现你的错误。我将尝试调试它现在我不能得到更多。然而,有三件事我可以指出:第一,我认为你的词汇是错误的。它应该是
形状[1]
而不是形状[0]
(TFIDF向量的大小)。第二:我不知道是否应该,但是,您的输出只有2个可能的值,但您有65个标签。最后但并非最不重要的一点是,您的问题似乎更深了一点,您可能会发现这篇文章很有用:使用顺序模型调试比预期的要困难一些。与PyTorch模型不同,您可以在每个训练步骤中放置一个断点()
。错误发生在FIT零件模型中。FIT(X_train_count,ytrain,batch_size=batch_size,epochs=epochs,verbose=1,回调=[checkpoint],validation_data=(X_test__count,yval))#开始训练如果您不做检查点并回调,你有错误吗?
ID Code
1 A32,D50,G00,I50,I82,K51,M85,R09,R18,T82,Z51
2 418,475,905,921,A41,C50,D70,E86,F32,F41,J18,R11,R50,Z00,Z51,Z93,Z95
3 136,304,320,418,475,921,998,A40,B37,G00,G35,I10,J15,J38,J69,L27,L89,T81,T85
4 D64,D69,E87,I10,I44,N17
5 E11,I10,I21,I25,I47
6 905,C61,C91,E87,G91,I60,M47,M79,R50,S43
7 304,320,355,E11,E86,E87,F06,I10,I50,I63,I69,J15,J69,L89,L97,M81,N17,Z91