如何在Python中绘制ROC曲线_Python_Matplotlib_Plot_Statistics_Roc

如何在Python中绘制ROC曲线

python matplotlib plot statistics

如何在Python中绘制ROC曲线,python,matplotlib,plot,statistics,roc,Python,Matplotlib,Plot,Statistics,Roc,我试图绘制ROC曲线，以评估我使用逻辑回归包在Python中开发的预测模型的准确性。我计算了真阳性率和假阳性率；但是，我无法理解如何使用matplotlib正确绘制这些曲线并计算AUC值。我该怎么做呢？这里的问题根本不清楚，但是如果你有一个数组真阳性率和一个数组假阳性率，那么绘制ROC曲线并获得AUC就很简单了： import matplotlib.pyplot as plt import numpy as np x = # false_positive_rate y = # true_pos

我试图绘制ROC曲线，以评估我使用逻辑回归包在Python中开发的预测模型的准确性。我计算了真阳性率和假阳性率；但是，我无法理解如何使用

matplotlib

正确绘制这些曲线并计算AUC值。我该怎么做呢？

这里的问题根本不清楚，但是如果你有一个数组

真阳性率

和一个数组

假阳性率

，那么绘制ROC曲线并获得AUC就很简单了：

import matplotlib.pyplot as plt
import numpy as np

x = # false_positive_rate
y = # true_positive_rate 

# This is the ROC curve
plt.plot(x,y)
plt.show() 

# This is the AUC
auc = np.trapz(y,x)

以下是用于计算ROC曲线（作为散点图）的python代码：

前面的答案假设您确实自己计算了TP/Sens。手动执行此操作是个坏主意，很容易在计算中出错，而不是使用库函数来完成所有这些操作

scikit_lean中的plot_roc函数正是您所需要的：

守则的主要部分是：

  for i in range(n_classes):
      fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
      roc_auc[i] = auc(fpr[i], tpr[i])

假设您的

模型

是sklearn预测器，您可以尝试以下两种方法：

import sklearn.metrics as metrics
# calculate the fpr and tpr for all thresholds of the classification
probs = model.predict_proba(X_test)
preds = probs[:,1]
fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
roc_auc = metrics.auc(fpr, tpr)

# method I: plt
import matplotlib.pyplot as plt
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

# method II: ggplot
from ggplot import *
df = pd.DataFrame(dict(fpr = fpr, tpr = tpr))
ggplot(df, aes(x = 'fpr', y = 'tpr')) + geom_line() + geom_abline(linetype = 'dashed')

或尝试

ggplot(df, aes(x = 'fpr', ymin = 0, ymax = 'tpr')) + geom_line(aes(y = 'tpr')) + geom_area(alpha = 0.2) + ggtitle("ROC Curve w/ AUC = %s" % str(roc_auc))

这是绘制ROC曲线的最简单方法，给出了一组基本事实标签和预测概率。最好的部分是，它为所有类绘制ROC曲线，因此您也可以得到多条外观整洁的曲线

import scikitplot as skplt
import matplotlib.pyplot as plt

y_true = # ground truth labels
y_probas = # predicted probabilities generated by sklearn classifier
skplt.metrics.plot_roc_curve(y_true, y_probas)
plt.show()

这是一条由plot_roc_曲线生成的样本曲线。我使用了scikit learn中的示例数字数据集，因此有10个类。请注意，为每个类绘制一条ROC曲线

免责声明：请注意，这使用了我构建的库。

我为ROC曲线制作了一个包含在包中的简单函数。我刚开始练习机器学习，所以如果这个代码有任何问题，也请告诉我

查看github自述文件了解更多详细信息！：）

matplotlib二元分类的AUC曲线加载乳腺癌数据集拆分数据集模型精确 AUC曲线

基于stackoverflow、scikit learn文档和其他一些文档的多条评论，我制作了一个python包，以一种非常简单的方式绘制ROC曲线（和其他指标）

要安装软件包：

pip安装绘图度量值

（更多信息在文章末尾）

绘制ROC曲线（示例来自文档）：

二元分类让我们加载一个简单的数据集并创建一个训练集和测试集：

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=1000, n_classes=2, weights=[1,1], random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=50, random_state=23)
model = clf.fit(X_train, y_train)

# Use predict_proba to predict probability of the class
y_pred = clf.predict_proba(X_test)[:,1]

训练分类器并预测测试集：

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=1000, n_classes=2, weights=[1,1], random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=50, random_state=23)
model = clf.fit(X_train, y_train)

# Use predict_proba to predict probability of the class
y_pred = clf.predict_proba(X_test)[:,1]

现在，您可以使用plot_metric绘制ROC曲线：

from plot_metric.functions import BinaryClassification
# Visualisation with plot_metric
bc = BinaryClassification(y_test, y_pred, labels=["Class 1", "Class 2"])

# Figures
plt.figure(figsize=(5,5))
bc.plot_roc_curve()
plt.show()

结果:

您可以在github上找到更多的示例和包的文档：

Github：
文件：

您也可以按照scikit的官方文档格式：

有一个名为的库，可以为您提供：

$pip安装metriculous

让我们首先模拟一些数据，这些数据通常来自测试数据集和模型：

将numpy导入为np
def规格化（数组2d:np.ndarray）->np.ndarray:
返回array2d/array2d.sum（axis=1，keepdims=True）
类别名称=[“猫”、“狗”、“猪”]
num\u classes=len（类名称）
样本数=500
#假事实
ground\u truth=np.random.choice（范围（num\u类），大小=num\u样本，p=[0.5,0.4,0.1]）
#模拟模型预测
完美模型=np.eye（数量级）[基本真理]
噪声_模型=标准化(
完美_模型+2*np.random.random（（num_样本，num_类））
)
random_model=规格化（np.random.random（（num_样本，num_类）））

现在，我们可以使用生成包含各种度量和图表的表，包括ROC曲线：

导入度量值
比较量词(
地面真理=地面真理，
模型预测=[完美模型、噪声模型、随机模型]，
模型名称=[“完美模型”、“嘈杂模型”、“随机模型”]，
类别名称=类别名称，
一对所有数字=真，#这条线对于在输出中包括ROC曲线很重要
).save_html（“model_comparison.html”）.display（）

输出中的ROC曲线：

绘图可以缩放和拖动，当鼠标悬停在绘图上时，您可以获得更多详细信息：

当您还需要概率时。。。以下内容获取AUC值并在一次拍摄中全部绘制

from sklearn.metrics import plot_roc_curve

plot_roc_curve(m,xs,y)

当你有可能。。。您无法一次性获得auc值和绘图。请执行以下操作：

from sklearn.metrics import roc_curve

fpr,tpr,_ = roc_curve(y,y_probas)
plt.plot(fpr,tpr, label='AUC = ' + str(round(roc_auc_score(y,m.oob_decision_function_[:,1]), 2)))
plt.legend(loc='lower right')

所以“preds”基本上是你的预测概率分数，“model”是你的分类器？@ChrisNielsen preds是y hat；是的，模型是经过训练的分类器。什么是所有阈值，它们是如何计算的？@mrloud它们是由sklearn.metrics.roc_曲线自动选择的。如何计算

y_train_真、y_train_prob、y_test_真、y_test_prob

？

y_train_真、y_test_真

应在标记的数据集中随时可用

y_train\u prob，y_test\u prob

是你训练过的神经网络的输出。如何计算

y_true，y_probas

？Reii Nakano-你是一个伪装成天使的天才。你让我高兴极了。这个软件包非常简单，但是非常有效。我完全尊重你。只是上面代码片段上的一点注释；上一行不是应该是：

skplt.metrics.plot\u roc\u曲线（y\u true，y\u probas）

？非常感谢。这应该被选为正确答案！非常有用的软件包我在尝试使用软件包时遇到问题。每次我尝试绘制roc曲线时，它都告诉我“指数太多了”。我正在为我的y_测试和pred提供数据。我能做出我的预测。但是由于那个错误，我无法得到情节。这是因为我正在运行的python版本吗？我必须将y_pred数据的大小改为Nx1，而不仅仅是一个列表：y_pred.reforme（len（y_pred），1）。现在我得到的是错误“IndexError:Index1超出了大小为1的轴1的界限”，但绘制了一个图，我猜这是因为代码需要一个二进制分类器

print("Accuracy", metrics.accuracy_score(y_test, y_pred))

y_pred_proba = clf.predict_proba(X_test)[::,1]
fpr, tpr, _ = metrics.roc_curve(y_test,  y_pred_proba)
auc = metrics.roc_auc_score(y_test, y_pred_proba)
plt.plot(fpr,tpr,label="data 1, auc="+str(auc))
plt.legend(loc=4)
plt.show()

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=1000, n_classes=2, weights=[1,1], random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=50, random_state=23)
model = clf.fit(X_train, y_train)

# Use predict_proba to predict probability of the class
y_pred = clf.predict_proba(X_test)[:,1]

from plot_metric.functions import BinaryClassification
# Visualisation with plot_metric
bc = BinaryClassification(y_test, y_pred, labels=["Class 1", "Class 2"])

# Figures
plt.figure(figsize=(5,5))
bc.plot_roc_curve()
plt.show()

from sklearn.metrics import plot_roc_curve

plot_roc_curve(m,xs,y)

from sklearn.metrics import roc_curve

fpr,tpr,_ = roc_curve(y,y_probas)
plt.plot(fpr,tpr, label='AUC = ' + str(round(roc_auc_score(y,m.oob_decision_function_[:,1]), 2)))
plt.legend(loc='lower right')