Numpy 层次凝聚聚类的实现_Numpy_Scipy_Cluster Computing_Linkage_Dendrogram

Numpy 层次凝聚聚类的实现

numpy cluster-computing

Numpy 层次凝聚聚类的实现,numpy,scipy,cluster-computing,linkage,dendrogram,Numpy,Scipy,Cluster Computing,Linkage,Dendrogram,我是新手，只想为RGB图像实现层次聚集聚类。为此，我从图像中提取RGB的所有值。然后我对图像进行处理，然后找到它的距离，然后建立链接。现在，我想从链接中提取具有索引id的指定索引上的原始数据（即RGB值） image = Image.open('image.jpg') image = image.convert('RGB') im = np.array(image).reshape((-1,3)) rgb = list(im.getdata()) X = pdist(im) Y = linkag

我是新手，只想为RGB图像实现层次聚集聚类。为此，我从图像中提取RGB的所有值。然后我对图像进行处理，然后找到它的距离，然后建立链接。现在，我想从链接中提取具有索引id的指定索引上的原始数据（即RGB值）

image = Image.open('image.jpg')
image = image.convert('RGB')
im = np.array(image).reshape((-1,3))
rgb = list(im.getdata())
X = pdist(im)
Y = linkage(X)
I = inconsistent(Y)

基于第4列的一致性。我选择最小值的截止，以获得最大的集群

cutoff = 0.7
cluster_assignments = fclusterdata(Y, cutoff)
# Print the indices of the data points in each cluster.
num_clusters = cluster_assignments.max()
print "%d clusters" % num_clusters
indices = cluster_indices(cluster_assignments)
ind = np.array(enumerate(rgb))
for k, ind in enumerate(indices):
    print "cluster", k + 1, "is", ind
dendrogram(Y)

我得到了这样的结果

cluster 6 is [ 6 11]
cluster 7 is [ 9 12]
cluster 8 is [15]

表示簇6包含6个和11个叶的索引。现在，我专注于如何映射这些索引以获得原始数据（即rgb值）。图像中每个像素的每个rgb值的索引。然后我必须生成代码本来实现聚集聚类。我不知道如何完成这项任务。读了很多东西，但什么都不懂

以下是我的解决方案：

import numpy as np
from scipy.cluster import hierarchy

im = np.array([[54,101,9],[ 67,89,27],[ 67,85,25],[ 55,106,1],[ 52,108,0],
 [ 55,78,24],[ 19,57,8],[ 19,46,0],[ 95,110,15],[112,159,57],
 [ 67,118,26],[ 76,127,35],[ 74,128,30],[ 25,62,0],[100,120,9],
 [127,145,61],[ 48,112,25],[198,25,21],[203,11,10],[127,171,60],
 [124,173,45],[120,133,19],[109,137,18],[ 60,85,0],[ 37,0,0],
 [187,47,20],[127,170,52],[ 30,56,0]])

groups = hierarchy.fclusterdata(im, 0.7)
idx_sorted = np.argsort(groups)
group_sorted = groups[idx_sorted]
im_sorted = im[idx_sorted]
split_idx = np.where(np.diff(group_sorted) != 0)[0] + 1
np.split(im_sorted, split_idx)

输出：

[array([[203,  11,  10],
       [198,  25,  21]]),
 array([[187,  47,  20]]),
 array([[127, 171,  60],
       [127, 170,  52]]),
 array([[124, 173,  45]]),
 array([[112, 159,  57]]),
 array([[127, 145,  61]]),
 array([[25, 62,  0],
       [30, 56,  0]]),
 array([[19, 57,  8]]),
 array([[19, 46,  0]]),
 array([[109, 137,  18],
       [120, 133,  19]]),
 array([[100, 120,   9],
       [ 95, 110,  15]]),
 array([[67, 89, 27],
       [67, 85, 25]]),
 array([[55, 78, 24]]),
 array([[ 52, 108,   0],
       [ 55, 106,   1]]),
 array([[ 54, 101,   9]]),
 array([[60, 85,  0]]),
 array([[ 74, 128,  30],
       [ 76, 127,  35]]),
 array([[ 67, 118,  26]]),
 array([[ 48, 112,  25]]),
 array([[37,  0,  0]])]

我对你的代码有很多疑问。1：为什么要将图像重塑为（-2，4），那么-2和-4的意思是什么？2:ndarray对象没有

getdata（）

方法。3：为什么要对

linkage（）

的返回值调用

fclusterdata（）

，我认为应该在

im

上调用它。4:什么是

cluster\u index（）

函数？1:原始图像具有形状（4,7,3），但pdist函数接受二维数组。因此，我使用rehape（-1,3）将其重塑为（28,3）。2：我想使用getdata（）获取所有像素值（RGB值）。它可以用来映射聚类索引，但不知道它是否正确。3：我在这个链接上用MATLAB阅读了分层聚类示例。他们在链接输出上应用了fclusterdata（）。4：我不确定索引，通过编写两行代码，我只能够获得基于fclusterdata的集群索引。输出示例112234441251111表示fcluster由5个簇组成，类似的数字表示相同的簇。可以发布图像数据吗，只需

im

array也可以。im array[[54 101 9][67 89 27][67 85 25][55 106 1][52 108 0][55 78 24][19 57 8][19 46 0][95 110 15][112 159 57][67 118 26][76 127 35][74 128 30][25 62 0][100 120 9][127 145 61][48 112 25][198 25 21][203 11 10][127 171 60][124 173 45][120 133 19][109 137 18][60 85 0][37 0 0][187 47 20][127 170 52][30 56 0]]感谢您花时间在这项工作上。