Scikit learn 混淆矩阵检验情绪分析模型_Scikit Learn_Nltk_Sentiment Analysis_Confusion Matrix

Scikit learn 混淆矩阵检验情绪分析模型

scikit-learn

Scikit learn 混淆矩阵检验情绪分析模型,scikit-learn,nltk,sentiment-analysis,confusion-matrix,Scikit Learn,Nltk,Sentiment Analysis,Confusion Matrix,我正在使用NLTK测试情绪分析模型。我需要在分类器结果中添加混淆矩阵，如果可能，还需要精度、召回率和F-度量值。到目前为止我只知道准确度。影评数据有pos和neg标签。然而，为了训练分类器，我使用的“featuresets”的格式与通常的（句子、标签）结构不同。在通过“featuresets”训练分类器后，我不确定是否可以使用sklearn中的混淆矩阵首先，您可以对所有测试值进行分类，并将预测结果和黄金结果存储在列表中然后，您可以使用nltk.ConfusionMatrix test_res

我正在使用NLTK测试情绪分析模型。我需要在分类器结果中添加混淆矩阵，如果可能，还需要精度、召回率和F-度量值。到目前为止我只知道准确度。影评数据有pos和neg标签。然而，为了训练分类器，我使用的“featuresets”的格式与通常的（句子、标签）结构不同。在通过“featuresets”训练分类器后，我不确定是否可以使用sklearn中的混淆矩阵

首先，您可以对所有测试值进行分类，并将预测结果和黄金结果存储在列表中

然后，您可以使用nltk.ConfusionMatrix

test_result = []
gold_result = []

for i in range(len(testing_set)):
    test_result.append(classifier.classify(testing_set[i][0]))
    gold_result.append(testing_set[i][1])

现在，您可以计算不同的指标

CM = nltk.ConfusionMatrix(gold_result, test_result)
print(CM)

print("Naive Bayes Algo accuracy percent:"+str((nltk.classify.accuracy(classifier, testing_set))*100)+"\n")

labels = {'pos', 'neg'}

from collections import Counter
TP, FN, FP = Counter(), Counter(), Counter()
for i in labels:
    for j in labels:
        if i == j:
            TP[i] += int(CM[i,j])
        else:
            FN[i] += int(CM[i,j])
            FP[j] += int(CM[i,j])

print("label\tprecision\trecall\tf_measure")
for label in sorted(labels):
    precision, recall = 0, 0
    if TP[label] == 0:
        f_measure = 0
    else:
        precision = float(TP[label]) / (TP[label]+FP[label])
        recall = float(TP[label]) / (TP[label]+FN[label])
        f_measure = float(2) * (precision * recall) / (precision + recall)
    print(label+"\t"+str(precision)+"\t"+str(recall)+"\t"+str(f_measure))

您可以检查-如何计算精度和召回率

您还可以使用：sklearn.metrics进行这些使用gold\u结果和test\u结果值的计算

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix   

print '\nClasification report:\n', classification_report(gold_result, test_result)
print '\nConfussion matrix:\n',confusion_matrix(gold_result, test_result)

的副本？请使用更多信息进行编辑。不鼓励只编写代码和“试试这个”答案，因为它们不包含可搜索的内容，也不解释为什么有人应该“试试这个”。我们在这里努力成为知识的资源。@RAVI，我不明白你是如何得到预期结果的。在我看来，您将所有分类测试值存储在列表（测试结果）中，并将参考值存储在列表（黄金结果）中。预测结果在哪里？@RAVI，您为nltk.metrics提供的链接说，我必须为每个分类标签构建2组。我想我必须修改所有代码才能做到这一点，对吗？测试结果值是预测结果。您可以看到，对于其他度量（精度、召回率、f-measure），test_结果是classifier.classify（）。我必须改变我的代码，为每个分类标签建立2套，还是有更好的方法？

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix   

print '\nClasification report:\n', classification_report(gold_result, test_result)
print '\nConfussion matrix:\n',confusion_matrix(gold_result, test_result)