Python 如何用训练好的分类器预测新的数据集
我用高斯分类器训练了一个模型,我的模型有63%的准确率。现在我需要使用这个模型来预测不同文件中的数据。我该怎么做 这就是我所做的代码Python 如何用训练好的分类器预测新的数据集,python,tensorflow,nlp,Python,Tensorflow,Nlp,我用高斯分类器训练了一个模型,我的模型有63%的准确率。现在我需要使用这个模型来预测不同文件中的数据。我该怎么做 这就是我所做的代码 import numpy as np import matplotlib.pyplot as plt import pandas as pd dataset = pd.read_csv('fno.tsv', delimiter = '\t', quoting = 3) import re from sklearn.externals import joblib i
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('fno.tsv', delimiter = '\t', quoting = 3)
import re
from sklearn.externals import joblib
import phrasemachine as pm
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from nltk.util import ngrams
corpus = []
for j in range(0, 400):
review = re.sub('[^a-zA-Z]', ' ', dataset['Final Narrative'][j])
review = review.lower()
review = review.split()
ps = PorterStemmer()
review = [ps.stem(word) for word in review if not word in set(stopwords.words('english'))]
review = ' '.join(review)
corpus.append(review)
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()
X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, 17].values
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.05, random_state = 0)
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
from sklearn.feature_extraction.text import TfidfVectorizer
tf=TfidfVectorizer()
text_tf= tf.fit_transform(dataset['Final Narrative'])
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
text_tf, dataset['Source of Hazard'], test_size=0.3, random_state=123)
#Accuracy Check
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
clf = GaussianNB().fit(X_train.toarray(), y_train)
predicted= clf.predict(X_test.toarray())
print("MultinomialNB Accuracy:",metrics.accuracy_score(y_test, predicted))
现在我有另一个名为data的文件,它只有数据(X)可预测,没有数据Y。我如何使用上述分类器预测这个新数据集?model.predit(),此函数将为分类算法提供预测结果。对新文件的预测如下所示
model.predict(X_data)
这将输出分类结果。model.predit(),此函数将为分类算法提供预测结果。对新文件的预测如下所示
model.predict(X_data)
这将输出分类结果