Python TfidfVectorizer出错,但CountVectorizer正常
我一整天都在做这个,但运气不好 我设法消除了TFIDFvectorier一行中的问题 这是我的工作代码Python TfidfVectorizer出错,但CountVectorizer正常,python,tensorflow,keras,scikit-learn,Python,Tensorflow,Keras,Scikit Learn,我一整天都在做这个,但运气不好 我设法消除了TFIDFvectorier一行中的问题 这是我的工作代码 from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer() vectorizer.fit(xtrain) X_train_count = vectorizer.transform(xtrain) X_test_count = vectorizer.transform
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
vectorizer.fit(xtrain)
X_train_count = vectorizer.transform(xtrain)
X_test_count = vectorizer.transform(xval)
X_train_count
from keras.models import Sequential
from keras import layers
input_dim = X_train_count.shape[1] # Number of features
model = Sequential()
model.add(layers.Dense(10, input_dim=input_dim, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()
history = model.fit(X_train_count, ytrain,
epochs=10,
verbose=False,
validation_data=(X_test_count, yval),
batch_size=10)
但是当我换成
from sklearn.feature_extraction.text import TfidfVectorizer
#TF-IDF initializer
vectorizer = TfidfVectorizer(max_df=0.8, max_features=1000)
vectorizer.fit(xtrain)
X_train_count = vectorizer.transform(xtrain)
X_test_count = vectorizer.transform(xval)
X_train_count
from keras.models import Sequential
from keras import layers
input_dim = X_train_count.shape[1] # Number of features
model = Sequential()
model.add(layers.Dense(10, input_dim=input_dim, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()
history = model.fit(X_train_count, ytrain,
epochs=10,
verbose=False,
validation_data=(X_test_count, yval),
batch_size=10)
唯一改变的是这两行
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(max_df=0.8, max_features=1000)
然后我得到了这个错误
InvalidArgumentError:索引[1]=[0997]出现故障。许多稀疏操作需要排序索引。使用
tf.sparse.reorder
创建顺序正确的副本
[Op:ManysParse]
如何修复该问题以及发生的原因?矢量化器。变换(…)生成稀疏数组,这对keras不好。您只需在一个简单的数组中转换它。这完全可以通过以下方式实现:
vectorizer.transform(...).toarray()
vectorizer.transform(…)
生成稀疏数组,这对keras不好。您只需在一个简单的数组中转换它。这完全可以通过以下方式实现:
vectorizer.transform(...).toarray()