Python 寻找簇间最短距离

Python 寻找簇间最短距离,python,arrays,scikit-learn,distance,nearest-neighbor,Python,Arrays,Scikit Learn,Distance,Nearest Neighbor,我在二维阵列中有10个数据点: array([[74, 89], [31, 55], [89, 74], [73, 20], [95, 35], [93, 82], [47, 81], [21, 83], [78, 54], [39, 45]]) 我使用sklearn nearestneights计算了他们之间的距离。我还将它们分为5个簇。 我的距离数组如下所示: array([[ 0. , 20.25, 21.21, 28.16, 35

我在二维阵列中有10个数据点:

array([[74, 89],
   [31, 55],
   [89, 74],
   [73, 20],
   [95, 35],
   [93, 82],
   [47, 81],
   [21, 83],
   [78, 54],
   [39, 45]])
我使用
sklearn nearestneights
计算了他们之间的距离。我还将它们分为5个簇。 我的距离数组如下所示:

array([[ 0.  , 20.25, 21.21, 28.16, 35.23, 53.34, 54.82, 56.22, 57.94, 69.01], [ 0.  , 12.81, 29.73, 30.53, 47.01, 54.67, 54.82, 61.03, 67.05, 67.62], [ 0.  ,  8.94, 21.21, 22.83, 39.46, 42.58, 56.32, 57.8 , 61.03, 68.59], [ 0.  , 26.63, 34.37, 42.2 , 54.67, 56.32, 65.15, 66.31, 69.01, 81.69], [ 0.  , 25.5 , 26.63, 39.46, 47.04, 56.89, 57.94, 66.48, 67.05, 88.2 ], [ 0.  ,  8.94, 20.25, 31.76, 46.01, 47.04, 65.15, 65.46, 67.62, 72.01], [ 0.  , 26.08, 28.16, 30.53, 36.88, 41.11, 42.58, 46.01, 66.31, 66.48], [ 0.  , 26.08, 29.73, 42.05, 53.34, 63.95, 68.59, 72.01, 81.69, 88.2 ], [ 0.  , 22.83, 25.5 , 31.76, 34.37, 35.23, 40.02, 41.11, 47.01, 63.95], [ 0.  , 12.81, 36.88, 40.02, 42.05, 42.2 , 56.22, 56.89, 57.8 , 65.46]]). 
请注意,任何子数组中的第一个元素都是0,因为这是点到自身的距离。
在我的示例中,每2个点都是一个簇(例如,点1和点2=簇1。点3和点4=簇2…。

如何找到任何簇和点之间的最短距离?e、 g.5个簇中任何一个簇之间的最短距离在点1(簇1)和点6(簇3)之间。

以下内容应能解决您的问题

步骤1。准备数据

x = np.array([[74, 89],
   [31, 55],
   [89, 74],
   [73, 20],
   [95, 35],
   [93, 82],
   [47, 81],
   [21, 83],
   [78, 54],
   [39, 45]])

clusters = np.random.choice([0,1,2,3,4],10, p = [.2,.2,.2,.2,.2])
x = list(zip(x, clusters))
x
[(array([74, 89]), 0),
 (array([31, 55]), 3),
 (array([89, 74]), 2),
 (array([73, 20]), 2),
 (array([95, 35]), 4),
 (array([93, 82]), 4),
 (array([47, 81]), 3),
 (array([21, 83]), 1),
 (array([78, 54]), 1),
 (array([39, 45]), 3)]
步骤2计算距离

from scipy.spatial import distance
dist = []
for i in range(len(x)):
    for j in range(i+1, len(x)):
        xi = x[i]
        xj = x[j]
        if xi[1] > xj[1]:
            xi, xj = xj, xi
        dist.append((xi,xj, distance.euclidean(xi[0], xj[0])))
dist
[((array([74, 89]), 0), (array([31, 55]), 3), 54.817880294662984),
 ((array([74, 89]), 0), (array([89, 74]), 2), 21.213203435596427),
 ((array([74, 89]), 0), (array([73, 20]), 2), 69.00724599634447),
 ((array([74, 89]), 0), (array([95, 35]), 4), 57.9396237474839),
 ((array([74, 89]), 0), (array([93, 82]), 4), 20.248456731316587),
 ((array([74, 89]), 0), (array([47, 81]), 3), 28.160255680657446),
 ((array([74, 89]), 0), (array([21, 83]), 1), 53.33854141237835),
 ((array([74, 89]), 0), (array([78, 54]), 1), 35.22782990761707),
 ((array([74, 89]), 0), (array([39, 45]), 3), 56.22277118748239),
 ((array([89, 74]), 2), (array([31, 55]), 3), 61.032778078668514),
 ((array([73, 20]), 2), (array([31, 55]), 3), 54.67174773134658),
 ((array([31, 55]), 3), (array([95, 35]), 4), 67.05221845696084),
...
结果数据的格式为
[point1,point2,distance]
格式,其中
point1/2
[coordinate1,coordinate2,cluster_num]

步骤3为任何簇组合选择距离最短的点

clust_unique = []
for i in range(5):
    for j in range(i+1,5):
        clust_unique.append((i,j))

minimum_distance = []
for c in clust_unique:
    minimum_distance.append(min([(x[0],x[1],x[2]) for x in dist if x[0][1]==c[0] and x[1][1] == c[1]],key=lambda x:x[2]))
minimum_distance
[((array([74, 89]), 0), (array([78, 54]), 1), 35.22782990761707),
 ((array([74, 89]), 0), (array([89, 74]), 2), 21.213203435596427),
 ((array([74, 89]), 0), (array([47, 81]), 3), 28.160255680657446),
 ((array([74, 89]), 0), (array([93, 82]), 4), 20.248456731316587),
 ((array([78, 54]), 1), (array([89, 74]), 2), 22.825424421026653),
 ((array([21, 83]), 1), (array([47, 81]), 3), 26.076809620810597),
 ((array([78, 54]), 1), (array([95, 35]), 4), 25.495097567963924),
 ((array([73, 20]), 2), (array([39, 45]), 3), 42.20189569201838),
 ((array([89, 74]), 2), (array([93, 82]), 4), 8.94427190999916),
 ((array([47, 81]), 3), (array([93, 82]), 4), 46.010868281309364)]

您可能希望添加一个显示您所做和执行的操作的列表output@Sergey布什马诺夫说得通!我只是不知道如何处理这个问题。这就是我把它贴在这里的原因。我正在寻找距离最短的两个点(来自两个不同的簇)以及它们之间的距离。因此,最佳输出为
[点1,点6,距离]