Python 两个文件之间的分类报告
我想在两个文件之间做一个评分。两者具有相同的数据,但标签不同。列车数据中的标签已更正,测试数据中的标签不一定。。。我想知道准确度、回忆度和f分数Python 两个文件之间的分类报告,python,python-3.x,machine-learning,scikit-learn,metrics,Python,Python 3.x,Machine Learning,Scikit Learn,Metrics,我想在两个文件之间做一个评分。两者具有相同的数据,但标签不同。列车数据中的标签已更正,测试数据中的标签不一定。。。我想知道准确度、回忆度和f分数 import pandas import numpy as np import pandas as pd from sklearn import metrics from sklearn import cross_validation from sklearn.linear_model import LogisticRegression from sk
import pandas
import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import precision_score, recall_score, confusion_matrix, classification_report, accuracy_score, f1_score
df_train = pd.read_csv('train.csv', sep = ',')
df_test = pd.read_csv('teste.csv', sep = ',')
vec_train = TfidfVectorizer()
X_train = vec_train.fit_transform(df_train['text'])
y_train = df_train['label']
vec_test = TfidfVectorizer()
X_test = vec_test.fit_transform(df_train['text'])
y_test = df_test['label']
clf = LogisticRegression(penalty='l2', multi_class = 'multinomial',solver ='newton-cg')
y_pred = clf.predict(X_test)
print ("Accuracy on training set:")
print (clf.score(X_train, y_train))
print ("Accuracy on testing set:")
print (clf.score(X_test, y_test))
print ("Classification Report:")
print (metrics.classification_report(y_test, y_pred))
一个愚蠢的数据示例:
TRAIN
text,label
dogs are cool,animal
flowers are beautifil,plants
pen is mine,objet
beyonce is an artist,person
TEST
text,label
dogs are cool,objet
flowers are beautifil,plants
pen is mine,person
beyonce is an artist,animal
错误:
回溯(最近一次呼叫最后一次):
文件“accurity.py”,第30行,in
y_pred=clf.预测(X_检验)
文件“/usr/lib/python3/dist packages/sklearn/linear_model/base.py”,第324行,在predict中
分数=自我决策函数(X)
文件“/usr/lib/python3/dist packages/sklearn/linear\u model/base.py”,第298行,在decision\u函数中
但是“%{'name”:键入(self)。name})
sklearn.exceptions.NotFitteError:此LogisticReturnal实例尚未安装
我只是想计算测试的准确度您必须首先使用
X\u train
对分类器对象进行训练,然后在X\u测试
上使用预测函数。像这样的
clf = LogisticRegression(penalty='l2', multi_class = 'multinomial',solver ='newton-cg')
#Then train the classifier over training data
clf.fit(X_train, y_train)
#Then use predict function to make predictions
y_pred = clf.predict(X_test)
您正在测试数据上安装新的
TFIDFvectorier
。这将产生错误的结果。您应该使用您在列车数据上安装的相同对象
这样做:
vec_train = TfidfVectorizer()
X_train = vec_train.fit_transform(df_train['text'])
X_test = vec_train.transform(df_test['text'])
之后,正如@MohammedKashif所说,您需要首先训练逻辑回归模型,然后在测试中进行预测
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
之后,您可以使用评分代码而不会出现任何错误。您根本没有安装您的型号!!!首先,您应该使用
fit()
Function。然后使用预测。您可以使用混淆矩阵
计算正确或错误预测。