Python 在2D图上绘制具有3个特征的分类决策树的决策面_Python_Matplotlib_Machine Learning_Scikit Learn

Python 在2D图上绘制具有3个特征的分类决策树的决策面

python matplotlib machine-learning scikit-learn

Python 在2D图上绘制具有3个特征的分类决策树的决策面,python,matplotlib,machine-learning,scikit-learn,Python,Matplotlib,Machine Learning,Scikit Learn,我的问题是我有3个功能，但我只想在一次使用2个功能的同时绘制一个2D图形，并显示所有可能的组合问题是我做了classifier.fit（X\u-train，Y\u-train），所以它需要训练3个特性，而不仅仅是2个X_序列是大小（70,3），即（n_样本，n_特征）到目前为止，我调整了原始代码，添加了z_min和z_max，因为我确实需要第三个功能，我需要能够使用classifier.predict（）我在plt.contourf指令中得到的错误是输入z必须是2D数组。 import m

我的问题是我有3个功能，但我只想在一次使用2个功能的同时绘制一个2D图形，并显示所有可能的组合

问题是我做了

classifier.fit（X\u-train，Y\u-train）

，所以它需要训练3个特性，而不仅仅是2个

X_序列

是大小（70,3），即（n_样本，n_特征）

到目前为止，我调整了原始代码，添加了

z_min

和

z_max

，因为我确实需要第三个功能，我需要能够使用

classifier.predict（）

我在

plt.contourf

指令中得到的错误是

输入z必须是2D数组。

import matplotlib as pl
import matplotlib.colors as colors
import matplotlib.cm as cmx

x_min, x_max = X_train[:, 0].min() - 1, X_train[:, 0].max() + 1
y_min, y_max = X_train[:, 1].min() - 1, X_train[:, 1].max() + 1
z_min, z_max = X_train[:, 2].min() - 1, X_train[:, 2].max() + 1

xx, yy, zz = np.meshgrid(np.arange(x_min, x_max, 0.1),
                 np.arange(y_min, y_max, 0.1),
                 np.arange(z_min, z_max, 0.1))

fig, ax = plt.subplots()

# here "model" is your model's prediction (classification) function
Z = classifier.predict(np.c_[np.c_[xx.ravel(), yy.ravel()], zz.ravel()])

# Put the result into a color plot
Z = Z.reshape(len(Z.shape), 2)
plt.contourf(xx, yy, Z, cmap=pl.cm.Paired)
plt.axis('off')

# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=pl.cm.Paired)

打印（z.shape）

（4612640，）

打印（xx.shape）

（20454508）

如何绘制具有3个特征的2D阵列+序列，但仅绘制2个特征，并保持阵列的正确形状

？如何将

调整到合适的大小

到目前为止，我尝试的是：

我想要这样的东西，我有2个特性，我只能预测2个值，而不是像示例中的3个

但是我看到的所有例子，他们只训练了2个特性，所以从我的理解来看，他们很好，他们没有面对我的

形状不正确的问题

是否也可以通过3D图形将其可视化，以便我们可以看到3个功能

我不认为形状/大小是这里的主要问题。在绘制三维要素空间的二维决策面（

contourf

）之前，必须进行一些计算。正确的等高线图要求每对

（X，Y）

都有一个单独的定义值（

）。以您的例子，只看

xx

和

yy

：

import pandas as pd

df = pd.DataFrame({'x': xx.ravel(),
                   'y': yy.ravel(),
                   'Class': Z.ravel()})
xy_summ = df.groupby(['x', 'y']).agg(lambda x: x.value_counts().to_dict())
xy_summ = (xy_summ.drop('Class', axis=1)
                  .reset_index()
                  .join(pd.DataFrame(list(xy_summ.Class)))
                  .fillna(0))
xy_summ[[0, 1, 2]] = xy_summ[[0, 1, 2]].astype(np.int)
xy_summ.head()

您会发现，对于每对

xx

和

yy

，您将得到2到3个可能的类，具体取决于

zz

的类型：

    xx  yy  0   1   2
0   3.3 1.0 25  15  39
1   3.3 1.1 25  15  39
2   3.3 1.2 25  15  39
3   3.3 1.3 25  15  39
4   3.3 1.4 25  15  39

因此，要使2D

contourf

工作，您必须从2或3种可能性中决定要调用什么Z。例如，可以使用加权类调用，如：

xy_summ['weighed_class'] = (xy_summ[1] + 2 * xy_summ[2]) / xy_summ[[0, 1, 2]].sum(1)

这将允许您绘制成功的二维绘图：

import itertools
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl

iris = load_iris()
X = iris.data[:, 0:3]
Y = iris.target
clf = DecisionTreeClassifier().fit(X, Y)

plot_step = 0.1
a, b, c = np.hsplit(X, 3)
ar = np.arange(a.min()-1, a.max()+1, plot_step)
br = np.arange(b.min()-1, b.max()+1, plot_step)
cr = np.arange(c.min()-1, c.max()+1, plot_step)
aa, bb, cc = np.meshgrid(ar, br, cr)
Z = clf.predict(np.c_[aa.ravel(), bb.ravel(), cc.ravel()])
datasets = [[0, len(ar), aa],
            [1, len(br), bb],
            [2, len(cr), cc]]

for i, (xsets, ysets) in enumerate(itertools.combinations(datasets, 2)):
    xi, xl, xx = xsets
    yi, yl, yy = ysets
    df = pd.DataFrame({'x': xx.ravel(),
                       'y': yy.ravel(),
                       'Class': Z.ravel()})
    xy_summ = df.groupby(['x', 'y']).agg(lambda x: x.value_counts().to_dict())
    xy_summ = (xy_summ.drop('Class', axis=1)
                      .reset_index()
                      .join(pd.DataFrame(list(xy_summ.Class)))
                      .fillna(0))
    xy_summ['weighed_class'] = (xy_summ[1] + 2 * xy_summ[2]) / xy_summ[[0, 1, 2]].sum(1)
    xyz = (xy_summ.x.values.reshape(xl, yl),
           xy_summ.y.values.reshape(xl, yl),
           xy_summ.weighed_class.values.reshape(xl, yl))

    ax = plt.subplot(1, 3, i + 1)
    ax.contourf(*xyz, cmap=mpl.cm.Paired)
    ax.scatter(X[:, xi], X[:, yi], c=Y, cmap=mpl.cm.Paired, edgecolor='black')
    ax.set_xlabel(iris.feature_names[xi])
    ax.set_ylabel(iris.feature_names[yi])

plt.show()

如果我正确理解这一点，“用3D图形将其可视化”将很困难。您不仅有3个功能，使其成为3D，而且还有一个类调用。最后，您实际上必须处理4D数据，或3D空间中类似密度的数据。我想这可能就是为什么3D决策空间（不再是真正的曲面）图形不太常见的原因。

问题是，为了创建二维决策曲面图，您只需要选择两个特征。这意味着将使用2个特征完成装配。这是去goThanks最好的方法和黄金标准的方法！我不得不更改这一行，因为它触发了一个错误<代码>xy_sum['weighted_class']=（xy_sum.iloc[1]+2*xy_sum.iloc[2]）/xy_sum.iloc[[0,1,2]]和（1）。但是我仍然想知道你是怎么得到加权函数的？我怎么知道它是适合我的功能的呢？@Mymozaaa我不知道。我编造这个只是为了向你展示这个概念/原则。我想只有你知道什么才是解决你问题的正确方法。