Python SKKmeans收敛警告_Python_Scikit Learn_K Means

Python SKKmeans收敛警告

python scikit-learn

Python SKKmeans收敛警告,python,scikit-learn,k-means,Python,Scikit Learn,K Means,我正在1D数据集上使用SKLearn的KMeans集群。我得到的错误是，当我运行代码时，我得到了一个ConvergenceWarning： ConvergenceWarning: Number of distinct clusters (<some integer n>) found smaller than n_clusters (<some integer bigger than n>). Possibly due to duplicate points in X.

我正在1D数据集上使用SKLearn的KMeans集群。我得到的错误是，当我运行代码时，我得到了一个

ConvergenceWarning

：

ConvergenceWarning: Number of distinct clusters (<some integer n>) found smaller than n_clusters (<some integer bigger than n>). Possibly due to duplicate points in X.
  return_n_iter=True)

样本输出：

ConvergenceWarning: Number of distinct clusters (14) found smaller than n_clusters (15). Possibly due to duplicate points in X. return_n_iter=True)

预期产出：

no warning

您可能会注意到输入有许多重复的值。这是意料之中的，我想知道如何更好地对这些数据进行聚类，这样我就不会得到具有重复质心的重复聚类。

理想情况下，指定的聚类数量不应超过唯一数据点的数量。如果您可以相应地调整质心计数，则不会引发警告

Sklearn使用

警告

模块发出警告。我们可以抑制警告，如下所示

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    cluster_data(data_arr)

所有警告都在

with

块中被抑制，因此应谨慎使用此功能。

“超出唯一数据点的数量”是一个不清楚的说法。您指的是变量还是值。此外，对于这个被引用的短语，您是建议降维还是特征选择？同样，焦点缺失。尝试进一步改进这个答案。MWE的数据是一维的。降维和特征选择都不适用。同样地，由于我和OP都没有提到这些想法，因此在回答中穿鞋很可能会让读者感到困惑。“唯一数据点”这里是唯一示例的数量（即，删除任何重复示例/点）。正如OP提到的，从MWE中可以明显看出，他有许多重复的观点。OP的原始数据集结构为多集，但与簇数（即，

n_簇

）相比，

sklearn

将该多集转换为一个集。

no warning

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    cluster_data(data_arr)