Python kmeans集群：如何访问集群数据点_Python_Scikit Learn_K Means

Python kmeans集群：如何访问集群数据点

python scikit-learn

Python kmeans集群：如何访问集群数据点,python,scikit-learn,k-means,Python,Scikit Learn,K Means,以下是我从kmeans scikit文档和讨论kmeans的博客文章中收集的kmeans算法的实现： #http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html #http://fromdatawithlove.thegovans.us/2013/05/clustering-using-scikit-learn.html from sklearn.cluster import KMeans impo

以下是我从kmeans scikit文档和讨论kmeans的博客文章中收集的kmeans算法的实现：

#http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
#http://fromdatawithlove.thegovans.us/2013/05/clustering-using-scikit-learn.html

from sklearn.cluster import KMeans
import numpy as np
from matplotlib import pyplot

X = np.array([[10, 2 , 9], [1, 4 , 3], [1, 0 , 3],
               [4, 2 , 1], [4, 4 , 7], [4, 0 , 5], [4, 6 , 3],[4, 1 , 7],[5, 2 , 3],[6, 3 , 3],[7, 4 , 13]])
kmeans = KMeans(n_clusters=3, random_state=0).fit(X)

k = 3
kmeans.fit(X)

labels = kmeans.labels_
centroids = kmeans.cluster_centers_

for i in range(k):
    # select only data observations with cluster label == i
    ds = X[np.where(labels==i)]
    # plot the data observations
    pyplot.plot(ds[:,0],ds[:,1],'o')
    # plot the centroids
    lines = pyplot.plot(centroids[i,0],centroids[i,1],'kx')
    # make the centroid x's bigger
    pyplot.setp(lines,ms=15.0)
    pyplot.setp(lines,mew=2.0)
pyplot.show()

print(kmeans.cluster_centers_.squeeze())

如何打印/访问每个k簇的数据点

if k = 3 : 
cluster 1 : [10, 2 , 9], [1, 4 , 3], [1, 0 , 3]                  
cluster 2 : [4, 0 , 5], [4, 6 , 3],[4, 1 , 7],[5, 2 , 3],[6, 3 , 3],[7, 4 , 13]
cluster 3 : [4, 2 , 1], [4, 4 , 7]

读取时，

kmeans

对象上没有用于此的属性或方法

更新：

kmeans.labels\返回array（[1,0,2,0,2,0,2,0,1]，dtype=int32）

但是这如何显示3个集群中的每个集群中的数据点呢
 如果您使用fitKMeans
对象的\u labels
属性，您将获得每个训练向量的集群分配数组。标签数组的顺序与训练数据相同，因此您可以对每个唯一的标签进行压缩或执行numpy.where（）。
要访问k-means聚类后的数据点，请执行以下操作：
新增代码：
sortedR = sorted(result, key=lambda x: x[1])
sortedR

完整代码：
    from sklearn.cluster import KMeans
    import numpy as np
    from matplotlib import pyplot

    X = np.array([[10, 2 , 9], [1, 4 , 3], [1, 0 , 3],
                   [4, 2 , 1], [4, 4 , 7], [4, 0 , 5], [4, 6 , 3],[4, 1 , 7],[5, 2 , 3],[6, 3 , 3],[7, 4 , 13]])
    kmeans = KMeans(n_clusters=3, random_state=0).fit(X)

    k = 3
    kmeans = KMeans(n_clusters=k)
    kmeans.fit(X)

    labels = kmeans.labels_
    centroids = kmeans.cluster_centers_

    for i in range(k):
        # select only data observations with cluster label == i
        ds = X[np.where(labels==i)]
        # plot the data observations
        pyplot.plot(ds[:,0],ds[:,1],'o')
        # plot the centroids
        lines = pyplot.plot(centroids[i,0],centroids[i,1],'kx')
        # make the centroid x's bigger
        pyplot.setp(lines,ms=15.0)
        pyplot.setp(lines,mew=2.0)
    pyplot.show()

result = zip(X , kmeans.labels_)

sortedR = sorted(result, key=lambda x: x[1])
sortedR

不是方法，不是…。请仔细查看链接中的文档。@JackManey我找到的最接近的是print（kmeans.labels）、print（kmeans.get\u params）、print（kmeans.cluster\u centers），但这些属性都不打印集群值……你说的“集群值”到底是什么意思？@JackManey我现在意识到“值”是不明确的。我所说的值是指“数据点”，我已经更新了这个问题。啊，在这种情况下，kmeans.labels会为每个对应的数据点提供集群分配（请记住，NumPy数组的行是按固定顺序排列的！）。