Python 3.x SVM分类器-保存训练后的模型_Python 3.x_Machine Learning_Svm

Python 3.x SVM分类器-保存训练后的模型

python-3.x machine-learning

Python 3.x SVM分类器-保存训练后的模型,python-3.x,machine-learning,svm,Python 3.x,Machine Learning,Svm,我是机器学习新手。我正在使用SGDClassizer对我的文档进行分类。我训练了这个模特。为了保存经过训练的数据，我使用了pickle 在classify.py中为培训模型编写代码 corpus=df2.title_desc #df2 is my dataframe with 2 columns title_desc and category vectorizer = TfidfVectorizer(stop_words='english') tfidf_matrix=vectorizer.f

我是机器学习新手。我正在使用SGDClassizer对我的文档进行分类。我训练了这个模特。为了保存经过训练的数据，我使用了pickle

在classify.py中为培训模型编写代码

corpus=df2.title_desc  #df2 is my dataframe with 2 columns title_desc and category
vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix=vectorizer.fit_transform(corpus).todense()

variables = tfidf_matrix
labels = df2.category

variables_train, variables_test, labels_train, labels_test  =   train_test_split(variables, labels, test_size=0.1)

svm_classifier=linear_model.SGDClassifier(loss='hinge',alpha=0.0001)

svm_classifier=svm_classifier.fit(variables_train, labels_train)

with open('my_dumped_classifier.pkl', 'wb') as fid:
    pickle.dump(svm_classifier, fid)

数据转储到文件后，我创建了另一个py文件来测试模型

test.py

corpus_test=df_test.title_desc  #df_testis my dataframe with 2 columns title_desc and category


 vectorizer = TfidfVectorizer(stop_words='english')

tfidf_matrix_test=vectorizer.fit_transform(corpus_test).todense()

svm_classifier=linear_model.SGDClassifier(loss='hinge',alpha=0.0001)


with open('my_dumped_classifier.pkl', 'rb') as fid:
    svm_classifier = pickle.load(fid)   

tfidf_matrix_test=vectorizer.transform(corpus_test).todense()
svm_predictions=svm_classifier.predict(tfidf_matrix_test)

我不确定我在test.py中给出的逻辑。一致

svm_predictions=svm_classifier.predict(tfidf_matrix_test)

这是一个错误'ValueError:X每个样本有249个特征；预计1050'

请给出解决方案。

这可能是因为培训模型时使用的功能数量为1050。因此，转储模型需要1050个特性，而您的测试数据有249个特性。这可能是因为训练模型时使用的特性数量为1050。因此，转储模型预期有1050个特性，而您的测试数据有249个特性。