在python中，如何提取dendogram中点之间的距离？_Python_Scikit Learn_Scipy_Hierarchical Clustering

在python中，如何提取dendogram中点之间的距离？

python scikit-learn

在python中，如何提取dendogram中点之间的距离？,python,scikit-learn,scipy,hierarchical-clustering,Python,Scikit Learn,Scipy,Hierarchical Clustering,我正在用python执行分层聚类，并获得了dendogram图。我想知道是否有一种方法可以提取最近点之间的距离，例如这里：7到8之间的距离（最近的一个），然后是0到1之间的距离，依此类推，以生成我使用函数的绘图： linkage_matrix= linkage(dfP, method="single") cluster_dict = dendrogram (linkage_matrix) 当你这样做的时候 Z = hierarchy.linkage(X, method=

我正在用python执行分层聚类，并获得了dendogram图。我想知道是否有一种方法可以提取最近点之间的距离，例如这里：7到8之间的距离（最近的一个），然后是0到1之间的距离，依此类推，以生成我使用函数的绘图：

linkage_matrix= linkage(dfP, method="single") 

cluster_dict = dendrogram (linkage_matrix)

当你这样做的时候

Z = hierarchy.linkage(X, method='single')

在

矩阵中，您拥有所需的一切：cluster1、cluster2、距离、集群中的元素数

比如说

import numpy as np
import pandas as pd
from scipy.cluster import hierarchy
import matplotlib.pyplot as plt
import seaborn as sns

我们有

array([[  2.,   5., 138.,   2.],
       [  3.,   4., 219.,   2.],
       [  0.,   7., 255.,   3.],
       [  1.,   8., 268.,   4.],
       [  6.,   9., 295.,   6.]])

因为我们只有6个元素，0到5是单个元素，从6开始它们是元素的簇

6是2个元素的第一个簇（2,5）
7是由2个元素组成的第二个簇（3,4）
8是第三个簇（0,7），即3个元素的（0，（3,4））
9是第四个簇（1,8），即4个元素的（1，（0，（3,4）））

然后我们有（6,9），即6个元素的（（2,5），（1，（0，（3,4）））

clusters = {
    0: '0',
    1: '1',
    2: '2',
    3: '3',
    4: '4',
    5: '5',
    6: '2,5',
    7: '3,4',
    8: '0,3,4',
    9: '1,0,3,4',
}

现在我们可以构建一个

df

来显示热图

# init the DataFrame
df = pd.DataFrame(
    columns=Z[:,0].astype(int), 
    index=Z[:,1].astype(int)
)

df.columns = df.columns.map(clusters)
df.index = df.index.map(clusters)

# populate the diagonal
for i, d in enumerate(Z[:,2]):
    df.iloc[i, i] = d

# fill NaN
df.fillna(0, inplace=True)
# mask everything but diagonal
mask = np.ones(df.shape, dtype=bool)
np.fill_diagonal(mask, 0)

# plot the heatmap
sns.heatmap(df, 
            annot=True, fmt='.0f', cmap="YlGnBu", 
            mask=mask)
plt.show()

更新我将

定义为一个距离数组。这些是元素之间距离的幂零下三角矩阵的值（按列）

我们可以核实

我们有

n=6

元素，这是距离的幂零下三角矩阵

# init the DataFrame
df = pd.DataFrame(columns=range(int(n)), index=range(int(n)))
# populate the DataFrame
idx = 0
for c in range(int(n)-1):
    for r in range(c+1, int(n)):
        df.iloc[r, c] = X[idx]
        idx += 1
# fill NaNs and mask
df.fillna(0, inplace=True)
mask = np.zeros_like(df)
mask[np.triu_indices_from(mask)] = True
# plot the matrix
sns.heatmap(df, annot=True, fmt='.0f', cmap="YlGnBu", mask=mask)
plt.show()

更新2 如何自动填充簇距离对角矩阵的地图字典

首先，我们必须计算元素的数量（仅当

是一个距离数组时才需要），正如我们前面所看到的

# number of elements
n = (np.sqrt(8 * X.size + 1) + 1) / 2

然后，我们可以循环通过

矩阵来填充字典

# clusters of single elements
clusters = {i: str(i) for i in range(int(n))}
# loop through Z matrix
for i, z in enumerate(Z.astype(int)):
    # cluster number
    cluster_num = int(n+i)
    # elements in clusters
    cluster_names = [clusters[z[0]], clusters[z[1]]]
    cluster_elements = [str(i) for i in cluster_names]
    # update the dictionary
    clusters.update({cluster_num: ','.join(cluster_elements)})

我们有

clusters

{0: '0',
 1: '1',
 2: '2',
 3: '3',
 4: '4',
 5: '5',
 6: '2,5',
 7: '3,4',
 8: '0,3,4',
 9: '1,0,3,4',
 10: '2,5,1,0,3,4'}

请提供您的

dfP

非常感谢！是的，dfP是一个距离矩阵。YVW，我添加了元素之间的距离矩阵，以防您需要再次感谢，这非常清楚@MaxPieriniSorry，最后一个问题，我如何创建制作热图所需的群集字典@MaxPieriniI使用填充字典的方法进行更新

# clusters of single elements
clusters = {i: str(i) for i in range(int(n))}
# loop through Z matrix
for i, z in enumerate(Z.astype(int)):
    # cluster number
    cluster_num = int(n+i)
    # elements in clusters
    cluster_names = [clusters[z[0]], clusters[z[1]]]
    cluster_elements = [str(i) for i in cluster_names]
    # update the dictionary
    clusters.update({cluster_num: ','.join(cluster_elements)})

clusters

{0: '0',
 1: '1',
 2: '2',
 3: '3',
 4: '4',
 5: '5',
 6: '2,5',
 7: '3,4',
 8: '0,3,4',
 9: '1,0,3,4',
 10: '2,5,1,0,3,4'}