如何用Python编写混淆矩阵？_Python_Machine Learning

如何用Python编写混淆矩阵？

python machine-learning

如何用Python编写混淆矩阵？,python,machine-learning,Python,Machine Learning,我用Python编写了一个混淆矩阵计算代码： def conf_mat(prob_arr, input_arr): # confusion matrix conf_arr = [[0, 0], [0, 0]] for i in range(len(prob_arr)): if int(input_arr[i]) == 1: if float(prob_arr[i])

我用Python编写了一个混淆矩阵计算代码：

def conf_mat(prob_arr, input_arr):
        # confusion matrix
        conf_arr = [[0, 0], [0, 0]]

        for i in range(len(prob_arr)):
                if int(input_arr[i]) == 1:
                        if float(prob_arr[i]) < 0.5:
                                conf_arr[0][1] = conf_arr[0][1] + 1
                        else:
                                conf_arr[0][0] = conf_arr[0][0] + 1
                elif int(input_arr[i]) == 2:
                        if float(prob_arr[i]) >= 0.5:
                                conf_arr[1][0] = conf_arr[1][0] +1
                        else:
                                conf_arr[1][1] = conf_arr[1][1] +1

        accuracy = float(conf_arr[0][0] + conf_arr[1][1])/(len(input_arr))

input_arr是数据集的原始类标签，如下所示：

 [1.0, 1.0, 1.0, 0.41592955657342651, 1.0, 0.0053405015805891975, 4.5321494433440449e-299, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.70943426182688163, 1.0, 1.0, 1.0, 1.0]

[2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 1]

我的代码试图做的是：我得到prob_arr和input_arr，对于每个类（1和2），我检查它们是否被错误分类

但是我的代码只适用于两个类。如果我为多类数据运行此代码，它将不起作用。我如何为多个类制作这个

例如，对于包含三个类的数据集，它应该返回：

[[21,7,3]，[3,38,6]，[5,4,19]

您应该从类映射到混淆矩阵中的一行

这里的映射很简单：

def row_of_class(classe):
    return {1: 0, 2: 1}[classe]

在循环中，计算

expected\u row

，

correct\u row

，并递增

conf\u arr[expected\u row][correct\u row]

。您甚至可以使用比开始时更少的代码。

此函数为任意数量的类创建混淆矩阵

def create_conf_matrix(expected, predicted, n_classes):
    m = [[0] * n_classes for i in range(n_classes)]
    for pred, exp in zip(predicted, expected):
        m[pred][exp] += 1
    return m

def calc_accuracy(conf_matrix):
    t = sum(sum(l) for l in conf_matrix)
    return sum(conf_matrix[i][i] for i in range(len(conf_matrix))) / t

与上面的函数不同，在调用函数之前，必须根据分类结果提取预测类，例如

[1 if p < .5 else 2 for p in classifications]

[1如果p<0.5，则分类中的p为2]

一般来说，您需要更改概率数组。您需要一个分数列表（每个类一个），而不是为每个实例指定一个数字，并根据其是否大于0.5进行分类，然后将最大的分数作为所选的类（也称为argmax）

您可以使用字典保存每个分类的概率：

prob_arr = [{classification_id: probability}, ...]

选择一个分类将类似于：

for instance_scores in prob_arr :
    predicted_classes = [cls for (cls, score) in instance_scores.iteritems() if score = max(instance_scores.values())]

这将处理两个类分数相同的情况。通过选择列表中的第一个，你可以得到一个分数，但是你如何处理它取决于你在分类什么

一旦你有了预测类的列表和预期类的列表，你就可以使用类似于的代码来创建混淆数组并计算准确度。

你可以使你的代码更简洁，并且（有时）使用它运行得更快。例如，在两种情况下，函数可以重写为（请参阅）：

，其中：

actual    = (numpy.array(input_arr) == 2)
predicted = (numpy.array(prob_arr) < 0.5)

actual=（numpy.array（输入数组）==2）
预测=（numpy.数组（prob_arr）<0.5）

（我建议无论如何都要使用）它是否包含在

度量模块中：
>>> from sklearn.metrics import confusion_matrix
>>> y_true = [0, 1, 2, 0, 1, 2, 0, 1, 2]
>>> y_pred = [0, 0, 0, 0, 1, 1, 0, 2, 2]
>>> confusion_matrix(y_true, y_pred)
array([[3, 0, 0],
       [1, 1, 1],
       [1, 1, 1]])

Scikit学习提供了一个混淆矩阵
功能
from sklearn.metrics import confusion_matrix
y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]
y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]
confusion_matrix(y_actu, y_pred)

它输出一个Numpy数组
array([[3, 0, 0],
       [0, 1, 2],
       [2, 1, 3]])

但您也可以使用熊猫创建混淆矩阵：
import pandas as pd
y_actu = pd.Series([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2], name='Actual')
y_pred = pd.Series([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2], name='Predicted')
df_confusion = pd.crosstab(y_actu, y_pred)

您将得到一个（贴有精美标签的）熊猫数据帧：
Predicted  0  1  2
Actual
0          3  0  0
1          0  1  2
2          2  1  3

如果添加margins=True
like
df_confusion = pd.crosstab(y_actu, y_pred, rownames=['Actual'], colnames=['Predicted'], margins=True)

您还将获得每行和每列的总和：
Predicted  0  1  2  All
Actual
0          3  0  0    3
1          0  1  2    3
2          2  1  3    6
All        5  2  5   12

您还可以使用以下方法获得标准化混淆矩阵：
df_conf_norm = df_confusion / df_confusion.sum(axis=1)

Predicted         0         1         2
Actual
0          1.000000  0.000000  0.000000
1          0.000000  0.333333  0.333333
2          0.666667  0.333333  0.500000

plot_confusion_matrix(df_conf_norm)  

您可以使用
import matplotlib.pyplot as plt
def plot_confusion_matrix(df_confusion, title='Confusion matrix', cmap=plt.cm.gray_r):
    plt.matshow(df_confusion, cmap=cmap) # imshow
    #plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(df_confusion.columns))
    plt.xticks(tick_marks, df_confusion.columns, rotation=45)
    plt.yticks(tick_marks, df_confusion.index)
    #plt.tight_layout()
    plt.ylabel(df_confusion.index.name)
    plt.xlabel(df_confusion.columns.name)

plot_confusion_matrix(df_confusion)


或使用以下方法绘制标准化混淆矩阵：
df_conf_norm = df_confusion / df_confusion.sum(axis=1)

Predicted         0         1         2
Actual
0          1.000000  0.000000  0.000000
1          0.000000  0.333333  0.333333
2          0.666667  0.333333  0.500000

plot_confusion_matrix(df_conf_norm)  


您可能还对该项目及其Pip包感兴趣
有了这个软件包，混乱矩阵可以很好的打印、绘图。
您可以对混淆矩阵进行二值化，获取类统计信息，如TP、TN、FP、FN、ACC、TPR、FPR、FNR、TNR（SPC）、LR+、LR-、DOR、PPV、FDR、FOR、NPV和一些总体统计信息
In [1]: from pandas_ml import ConfusionMatrix
In [2]: y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]
In [3]: y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]
In [4]: cm = ConfusionMatrix(y_actu, y_pred)
In [5]: cm.print_stats()
Confusion Matrix:

Predicted  0  1  2  __all__
Actual
0          3  0  0        3
1          0  1  2        3
2          2  1  3        6
__all__    5  2  5       12


Overall Statistics:

Accuracy: 0.583333333333
95% CI: (0.27666968568210581, 0.84834777019156982)
No Information Rate: ToDo
P-Value [Acc > NIR]: 0.189264302376
Kappa: 0.354838709677
Mcnemar's Test P-Value: ToDo


Class Statistics:

Classes                                        0          1          2
Population                                    12         12         12
P: Condition positive                          3          3          6
N: Condition negative                          9          9          6
Test outcome positive                          5          2          5
Test outcome negative                          7         10          7
TP: True Positive                              3          1          3
TN: True Negative                              7          8          4
FP: False Positive                             2          1          2
FN: False Negative                             0          2          3
TPR: (Sensitivity, hit rate, recall)           1  0.3333333        0.5
TNR=SPC: (Specificity)                 0.7777778  0.8888889  0.6666667
PPV: Pos Pred Value (Precision)              0.6        0.5        0.6
NPV: Neg Pred Value                            1        0.8  0.5714286
FPR: False-out                         0.2222222  0.1111111  0.3333333
FDR: False Discovery Rate                    0.4        0.5        0.4
FNR: Miss Rate                                 0  0.6666667        0.5
ACC: Accuracy                          0.8333333       0.75  0.5833333
F1 score                                    0.75        0.4  0.5454545
MCC: Matthews correlation coefficient  0.6831301  0.2581989  0.1690309
Informedness                           0.7777778  0.2222222  0.1666667
Markedness                                   0.6        0.3  0.1714286
Prevalence                                  0.25       0.25        0.5
LR+: Positive likelihood ratio               4.5          3        1.5
LR-: Negative likelihood ratio                 0       0.75       0.75
DOR: Diagnostic odds ratio                   inf          4          2
FOR: False omission rate                       0        0.2  0.4285714

我注意到一个关于混淆矩阵的新Python库已经发布：也许你可以看看。
如果你不想让scikit学习为你做这项工作
    import numpy
    actual = numpy.array(actual)
    predicted = numpy.array(predicted)

    # calculate the confusion matrix; labels is numpy array of classification labels
    cm = numpy.zeros((len(labels), len(labels)))
    for a, p in zip(actual, predicted):
        cm[a][p] += 1

    # also get the accuracy easily with numpy
    accuracy = (actual == predicted).sum() / float(len(actual))

或者看看中更完整的实现。
我编写了一个简单的类来构建混淆矩阵，而无需依赖机器学习库
可以使用该类，例如：
labels=[“猫”、“狗”、“迅猛龙”、“海怪”、“小马”]
ConversionMatrix=ConversionMatrix（标签）
confusionMatrix.update（“cat”、“cat”）
confusionMatrix.update（“猫”、“狗”）
...
confusionMatrix.update（“海怪”、“velociraptor”）
confusionMatrix.update（“velociraptor”、“velociraptor”）
composionMatrix.plot（）

类混淆矩阵：
导入pylab
导入集合
将numpy作为np导入
类混淆矩阵：
定义初始化（自我，标签）：
self.labels=标签
self.mission\u dictionary=self.build\u mission\u dictionary（标签）
def更新（自我、预测的_标签、预期的_标签）：
self.mission\u dictionary[预期的\u标签][预期的\u标签]+=1
def生成字典（自我、标签集）：
预期的\u标签=collections.OrderedDict（）
对于标签集合中的预期标签：
预期的\u标签[预期的\u标签]=集合。OrderedDict（）
对于标签集合中的预测标签：
预期的\u标签[预期的\u标签][预测的\u标签]=0.0
返回预期的\u标签
def将_转换为_矩阵（自身、字典）：
长度=长度（字典）
混淆字典=np.0（（长度，长度））
i=0
对于字典中的行：
j=0
对于字典中的列：
字典[i][j]=字典[行][列]
j+=1
i+=1
返回字典
def获取矩阵（自身）：
矩阵=self.convert\u到矩阵（self.convert\u字典）
返回自规范化（矩阵）
def规格化（自身、矩阵）：
amin=np.amin（矩阵）
amax=np.amax（矩阵）
返回[[（（y-amin）*（1-0））/（amax-amin）]，表示x中的y]表示矩阵中的x]
def绘图（自）：
矩阵=自身。获取矩阵（）
pylab.图（）
imshow（矩阵，插值='nearest'，cmap=pylab.cm.jet）
pylab.标题（“混淆矩阵”）
对于枚举（矩阵）中的i、vi：
对于枚举（vi）中的j，vj：
pylab.text（j，i+.1，%.1f”%vj，fontsize=12）
pylab.colorbar（）
类=np.arange（len（self.labels））
pylab.xticks（类、self.label）
pylab.yticks（类、自标签）
pylab.ylabel（'预期标签'）
pylab.xlabel（'预测标签'）
pylab.show（）
仅限
# A Simple Confusion Matrix Implementation
def confusionmatrix(actual, predicted, normalize = False):
    """
    Generate a confusion matrix for multiple classification
    @params:
        actual      - a list of integers or strings for known classes
        predicted   - a list of integers or strings for predicted classes
        normalize   - optional boolean for matrix normalization
    @return:
        matrix      - a 2-dimensional list of pairwise counts
    """
    unique = sorted(set(actual))
    matrix = [[0 for _ in unique] for _ in unique]
    imap   = {key: i for i, key in enumerate(unique)}
    # Generate Confusion Matrix
    for p, a in zip(predicted, actual):
        matrix[imap[p]][imap[a]] += 1
    # Matrix Normalization
    if normalize:
        sigma = sum([sum(matrix[imap[i]]) for i in unique])
        matrix = [row for row in map(lambda i: list(map(lambda j: j / sigma, i)), matrix)]
    return matrix

cm = confusionmatrix(
    [1, 1, 2, 0, 1, 1, 2, 0, 0, 1], # actual
    [0, 1, 1, 0, 2, 1, 2, 2, 0, 2]  # predicted
)

# And The Output
print(cm)
[[2, 1, 0], [0, 2, 1], [1, 2, 1]]

# Actual
# 0  1  2
  #  #  #   
[[2, 1, 0], # 0
 [0, 2, 1], # 1  Predicted
 [1, 2, 1]] # 2

cm = confusionmatrix(
    ["B", "B", "C", "A", "B", "B", "C", "A", "A", "B"], # actual
    ["A", "B", "B", "A", "C", "B", "C", "C", "A", "C"]  # predicted
)

# And The Output
print(cm)
[[2, 1, 0], [0, 2, 1], [1, 2, 1]]

cm = confusionmatrix(
    ["B", "B", "C", "A", "B", "B", "C", "A", "A", "B"], # actual
    ["A", "B", "B", "A", "C", "B", "C", "C", "A", "C"], # predicted
    normalize = True
)

# And The Output
print(cm)
[[0.2, 0.1, 0.0], [0.0, 0.2, 0.1], [0.1, 0.2, 0.1]]

# Actual & Predicted Classes
actual      = ["A", "B", "C", "C", "B", "C", "C", "B", "A", "A", "B", "A", "B", "C", "A", "B", "C"]
predicted   = ["A", "B", "B", "C", "A", "C", "A", "B", "C", "A", "B", "B", "B", "C", "A", "A", "C"]

# Initialize Performance Class
performance = Performance(actual, predicted)

# Print Confusion Matrix
performance.tabulate()

===================================
        Aᴬ      Bᴬ      Cᴬ

Aᴾ      3       2       1

Bᴾ      1       4       1

Cᴾ      1       0       4

Note: classᴾ = Predicted, classᴬ = Actual
===================================

# Print Normalized Confusion Matrix
performance.tabulate(normalized = True)

===================================
        Aᴬ      Bᴬ      Cᴬ

Aᴾ      17.65%  11.76%  5.88%

Bᴾ      5.88%   23.53%  5.88%

Cᴾ      5.88%   0.00%   23.53%

Note: classᴾ = Predicted, classᴬ = Actual
===================================

import numpy as np

def compute_confusion_matrix(true, pred):
  '''Computes a confusion matrix using numpy for two np.arrays
  true and pred.

  Results are identical (and similar in computation time) to: 
    "from sklearn.metrics import confusion_matrix"

  However, this function avoids the dependency on sklearn.'''

  K = len(np.unique(true)) # Number of classes 
  result = np.zeros((K, K))

  for i in range(len(true)):
    result[true[i]][pred[i]] += 1

  return result

import numpy as np

classes = 3
true = np.random.randint(0, classes, 50)
pred = np.random.randint(0, classes, 50)

np.bincount(true * classes + pred).reshape((classes, classes))

def confusion_matrix(actual, predicted):
    classes       = np.unique(np.concatenate((actual,predicted)))
    confusion_mtx = np.empty((len(classes),len(classes)),dtype=np.int)
    for i,a in enumerate(classes):
        for j,p in enumerate(classes):
            confusion_mtx[i,j] = np.where((actual==a)*(predicted==p))[0].shape[0]
    return confusion_mtx

actual    = np.array([1,1,1,1,0,0,0,0])
predicted = np.array([1,1,1,1,0,0,0,1])
confusion_matrix(actual,predicted)

   0  1
0  3  1
1  0  4

actual    = np.array(["a","a","a","a","b","b","b","b"])
predicted = np.array(["a","a","a","a","b","b","b","a"])
confusion_matrix(actual,predicted)

   0  1
0  4  0
1  1  3

actual    = np.array(["a","a","a","a","b","b","b","b"])
predicted = np.array(["a","a","a","a","b","b","b","z"]) # <-- notice the 3rd class, "z"
confusion_matrix(actual,predicted)

   0  1  2
0  4  0  0
1  0  3  1
2  0  0  0

actual    = np.array(["a","a","a","x","x","b","b","b"]) # <-- notice the 4th class, "x"
predicted = np.array(["a","a","a","a","b","b","b","z"])
confusion_matrix(actual,predicted)

   0  1  2  3
0  3  0  0  0
1  0  2  0  1
2  1  1  0  0
3  0  0  0  0

def get_confusion_matrix(l1, l2):

    assert len(l1)==len(l2), "Two lists have different size."

    K = len(np.unique(l1))

    # create label-index value
    label_index = dict(zip(np.unique(l1), np.arange(K)))

    result = np.zeros((K, K))
    for i in range(len(l1)):
        result[label_index[l1[i]]][label_index[l2[i]]] += 1

    return result

def confusionMatrix(actual, pred):

   TP = (actual==pred)[actual].sum()
   TN = (actual==pred)[~actual].sum()
   FP = (actual!=pred)[~actual].sum()
   FN = (actual!=pred)[actual].sum()

   return [[TP, TN], [FP, FN]]