Python 如何绘制多类分类器的精度和召回率？_Python_Matplotlib_Scikit Learn_Roc_Precision Recall

Python 如何绘制多类分类器的精度和召回率？

python matplotlib scikit-learn

Python 如何绘制多类分类器的精度和召回率？,python,matplotlib,scikit-learn,roc,precision-recall,Python,Matplotlib,Scikit Learn,Roc,Precision Recall,我正在使用scikit learn，我想绘制精度和召回曲线。我使用的分类器是RandomForestClassifier。scikit learn文档中的所有资源都使用二进制分类。另外，我可以为多类绘制ROC曲线吗另外，我只找到了用于多标签的SVM，它有一个decision\u函数，这是scikit学习文档中没有的RandomForest： : 精度召回曲线通常用于二进制分类，以研究分类器的输出。为了延长精度召回曲线和对多类或多类的平均精度多标签分类，有必要对输出进行二值化。每个标

我正在使用scikit learn，我想绘制精度和召回曲线。我使用的分类器是

RandomForestClassifier

。scikit learn文档中的所有资源都使用二进制分类。另外，我可以为多类绘制ROC曲线吗

另外，我只找到了用于多标签的SVM，它有一个

decision\u函数

，这是scikit学习文档中没有的

RandomForest

：

精度召回曲线通常用于二进制分类，以研究分类器的输出。为了延长精度召回曲线和对多类或多类的平均精度多标签分类，有必要对输出进行二值化。每个标签可以绘制一条曲线，但也可以绘制一条曲线通过考虑标签的每个元素，精确召回曲线指标矩阵作为二进制预测（微平均）

ROC曲线通常用于二元分类，以研究分类器的输出。为了将ROC曲线和ROC面积延伸到多类或多标签分类，有必要进行二值化输出。每个标签可以绘制一条ROC曲线，但也可以绘制一条ROC曲线通过考虑标签指示器的每个元素绘制ROC曲线矩阵作为二进制预测（微平均）

因此，您应该对输出进行二进制化，并考虑每个类的精确回忆和ROC曲线。此外，您将使用获取类概率

我将代码分为三部分：

一般设置、学习和预测

精度召回曲线

ROC曲线

1。一般设置、学习和预测

from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import precision_recall_curve, roc_curve
from sklearn.preprocessing import label_binarize

import matplotlib.pyplot as plt
#%matplotlib inline

mnist = fetch_mldata("MNIST original")
n_classes = len(set(mnist.target))

Y = label_binarize(mnist.target, classes=[*range(n_classes)])

X_train, X_test, y_train, y_test = train_test_split(mnist.data,
                                                    Y,
                                                    random_state = 42)

clf = OneVsRestClassifier(RandomForestClassifier(n_estimators=50,
                             max_depth=3,
                             random_state=0))
clf.fit(X_train, y_train)

y_score = clf.predict_proba(X_test)

2。精确召回曲线

# precision recall curve
precision = dict()
recall = dict()
for i in range(n_classes):
    precision[i], recall[i], _ = precision_recall_curve(y_test[:, i],
                                                        y_score[:, i])
    plt.plot(recall[i], precision[i], lw=2, label='class {}'.format(i))
    
plt.xlabel("recall")
plt.ylabel("precision")
plt.legend(loc="best")
plt.title("precision vs. recall curve")
plt.show()

# roc curve
fpr = dict()
tpr = dict()

for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i],
                                  y_score[:, i]))
    plt.plot(fpr[i], tpr[i], lw=2, label='class {}'.format(i))

plt.xlabel("false positive rate")
plt.ylabel("true positive rate")
plt.legend(loc="best")
plt.title("ROC curve")
plt.show()

3。ROC曲线

# precision recall curve
precision = dict()
recall = dict()
for i in range(n_classes):
    precision[i], recall[i], _ = precision_recall_curve(y_test[:, i],
                                                        y_score[:, i])
    plt.plot(recall[i], precision[i], lw=2, label='class {}'.format(i))
    
plt.xlabel("recall")
plt.ylabel("precision")
plt.legend(loc="best")
plt.title("precision vs. recall curve")
plt.show()

# roc curve
fpr = dict()
tpr = dict()

for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i],
                                  y_score[:, i]))
    plt.plot(fpr[i], tpr[i], lw=2, label='class {}'.format(i))

plt.xlabel("false positive rate")
plt.ylabel("true positive rate")
plt.legend(loc="best")
plt.title("ROC curve")
plt.show()

这里有一段有示例的段落：。这不是你想要的吗？@Yohst那个例子使用svm和决策函数，而RandomForest没有决策函数。为什么我使用OneVsRestClassifier？RandomForest是否已经支持多类？我在运行第一部分时出现了这些错误：UserWarning:所有培训示例中均不存在标签0 UserWarning:所有培训示例中均不存在标签1 UserWarning:所有培训示例中均不存在标签2请注意，警告不是错误。考虑到这一行

Y=label\u binarize（mnist.target，classes=[*range（n\u classes）]）

，您应该在数据集中提供类。在我的示例中，类是

[0,1,2，…，9]

。