Python 以下绘制决策面悬挂的代码在哪里？_Python_Scikit Learn_Mesh_Freeze

Python 以下绘制决策面悬挂的代码在哪里？

python scikit-learn

Python 以下绘制决策面悬挂的代码在哪里？,python,scikit-learn,mesh,freeze,Python,Scikit Learn,Mesh,Freeze,作为练习，我复制粘贴了iris数据集上sklearn文档中的决策面绘制代码： # Few differences from the original at the link below (two classes, some renamed vars): # http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html # Parameters n_classes = 2 plot_colors = "rb" plot_step

作为练习，我复制粘贴了iris数据集上sklearn文档中的决策面绘制代码：

# Few differences from the original at the link below (two classes, some renamed vars):
# http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html
# Parameters
n_classes = 2
plot_colors = "rb"
plot_step = 0.02

# Get my X and y - each sample is a histogram with a binary class label.
X, y, positives = Loader.load_cluster_size_histograms_singular(m=115, upper=21, norm=False, display_plot=False, pretty_print=False)

my_features = [str(i+1) for i in range(X.shape[1])]
my_features[-1] = my_features[-1] + '+'
features = np.asarray(my_features)

# Load iris data
iris = load_iris()

iris.data = iris.data[:, 100]
iris.target = iris.target[:, 100]

features = iris.feature_names  # Comment or uncomment as necessary

# Now asserting that my X and y does not contain np.nan or np.inf (wouldn't sklearn catch this though?)
# Also check for correct sizing. We're really running out of potential failures here.
for i in range(115):
    assert(np.nan not in X[i])
    assert(np.inf not in X[i])
    assert(X[i].shape[0] == 21)

# They do not. X and y are clean.

for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],
                                [1, 2], [1, 3], [2, 3]]):
    # Set local_X = X[:, pair], local_y = y, features to my_features... BOOOOOM! 
    # CPU gets nuked, doesn't terminate.
    local_X = iris.data[:, pair]
    local_y = iris.target

    # Train
    clf = Pipeline(steps=[("scaling", StandardScaler()), ("classifier", LogisticRegression(verbose=100))])
    clf.fit(local_X, local_y)

    # Plot the decision boundary
    plt.subplot(2, 3, pairidx + 1)

    x_min, x_max = local_X[:, 0].min() - 1, local_X[:, 0].max() + 1
    y_min, y_max = local_X[:, 1].min() - 1, local_X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                         np.arange(y_min, y_max, plot_step))
    plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)

    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu)

    plt.xlabel(features[pair[0]])
    plt.ylabel(features[pair[1]])

    # Plot the training points
    for i, color in zip(range(n_classes), plot_colors):
        idx = np.where(local_y == i)
        plt.scatter(local_X[idx, 0], local_X[idx, 1], c=color, label=features[i],
                    cmap=plt.cm.RdBu, edgecolor='black', s=15)

plt.suptitle("Decision surface of a decision tree using paired features")
plt.legend(loc='lower right', borderpad=0, handletextpad=0)
plt.axis("tight")
plt.show()

具有以下输出：

好吧？因此，代码是完全好的，只是重命名和使用/删除一些微小的位和bob。这里绝对没有问题

我的问题是，当我用自己的iris数据集替换iris数据集时，它会在

clf.fit（local\u X，local\u y）

行中完全摧毁CPU。不管什么分类器，逻辑回归，支持向量机，高斯神经网络，等等。每样东西都会慢到令人难以置信的速度，点击注册需要几十秒。即使在听到我的CPU被水刑几分钟后，也不会终止。上述代码中唯一的区别是我设置了

local_X=X[：，pair]

，我设置了

local_y=y

，我设置了

features=np.asarray（my_features）

（其中my_features是我自己的特征名称向量，作为numpy数组）

1.4 GHz Intel Core i5的Macbook Air上CPU负载的可视图像：

我的数据集也不太大——只有（115,21）和（115，）用于我自己的X和y。因此，数据的大小不能成为一个因素

现在为那些喜欢批评而不是帮助的人提供一些Q/A：

你没有调整你的输入

错。这是我计划的第一阶段。我会说我的特征向量是直方图。相反，我尝试通过将每个直方图的总和设为1来进行缩放。完全相同的问题

你做错了

出色的观察力。你能确切地解释一下我做错了什么吗

你试过把它关掉再打开吗

那会有什么帮助？是的，我有，尽管我完全没有理由解释为什么我应该这样做。我已经重新启动了内核，新的会话。同样的问题

当我在分类器的详细度设置为100的情况下运行我的代码时，我得到的唯一输出是：

[LibLinear]

不太多，但都是打印出来的。感谢所有有用的评论、建议和答案

EDTI：

我被要求提供我的数据集的代表性样本。如上所述，样本为直方图。示例可能如下所示（类型为np.array，元素类型为np.float32）：

更新：因此，在尝试使用

norm=True

再次加载我的数据集后（这意味着每个直方图的总和为1，因此我的浮点值介于0和1之间，但没有进行其他归一化，这在管道中没有StandardScaler（）），代码运行，但得到了一个无用的结果：

因为当管道中包含StandardScaler（）时，我在使用逻辑回归时会得到类似的奇怪结果：

当norm=False时，仍然会发生完全挂起。这很奇怪

所以我找到了问题所在-实际上不是

fit（）

函数中断。它是

np.meshgrid（）

！事实上，当输入范围为数百或数千时，

plot\u size

参数设置为0.02

我的猜测是，当使用值的范围调用

np.meshgrid（）

时，坐标的绝对数量会导致它完全崩溃。一旦我开始对我的输入使用更能反映合理步骤（例如100）的值，它就开始工作了

非常愚蠢的是，

np.meshgrid（）

没有对这些类型的输入发出警告。我的CPU上的负载量一度达到475%，这要归功于我没有一个机头。同样，sklearn文档可能会提到应相应调整

plot\u步骤

参数。

您可以共享您的数据集吗？如果没有，您的数据中是否有任何异常元素，如inf或NaN？你能发布你的代码，并展示你如何用你自己的数据替换Iris数据集吗？@troymyname00已经手动检查并使用了如上所述的断言-答案是否定的，数据集是干净的。在我的情况下可以使用ints，但这绝对不应该是一个问题。然后请编辑您的问题，并消除错误的诊断和所有不必要的代码。您可以用一个简单的print语句来诊断它挂在哪里，目前来看，这不是一个可重用的资源。

[1515. 1072.  598.  447.  307.  221.  184.  166.  121.   82.   76.   67.   69.   58.   39.   49.   40.   37.   24.   27.  590.]