numpy阵列中标记组件之间的最小边到边欧氏距离_Numpy_Image Processing_Scipy_Euclidean Distance_Connected Components

numpy阵列中标记组件之间的最小边到边欧氏距离

numpy image-processing

numpy阵列中标记组件之间的最小边到边欧氏距离,numpy,image-processing,scipy,euclidean-distance,connected-components,Numpy,Image Processing,Scipy,Euclidean Distance,Connected Components,我在大型numpy数组中有许多不同的形式，我想使用numpy和scipy计算它们之间的边到边欧氏距离注意：我进行了搜索，这与堆栈上以前的其他问题不同，因为我希望获得阵列内标记面片之间的最小距离，而不是像其他问题所要求的那样获得点或独立阵列之间的最小距离我目前的方法使用KDTree，但对于大型阵列来说效率极低。基本上，我查找每个标记组件的坐标，并计算所有其他组件之间的距离。最后以平均最小距离为例进行了计算我正在寻找一种使用python的更智能的方法，最好不使用任何额外的模块 import n

我在大型

numpy

数组中有许多不同的形式，我想使用

numpy

和

scipy

计算它们之间的边到边欧氏距离

注意：我进行了搜索，这与堆栈上以前的其他问题不同，因为我希望获得阵列内标记面片之间的最小距离，而不是像其他问题所要求的那样获得点或独立阵列之间的最小距离

我目前的方法使用KDTree，但对于大型阵列来说效率极低。基本上，我查找每个标记组件的坐标，并计算所有其他组件之间的距离。最后以平均最小距离为例进行了计算

我正在寻找一种使用python的更智能的方法，最好不使用任何额外的模块

import numpy
from scipy import spatial
from scipy import ndimage

# Testing array
a = numpy.zeros((8,8), dtype=numpy.int)
a[2,2] = a[3,1] = a[3,2] = 1
a[2,6] = a[2,7] = a[1,6] = 1
a[5,5] = a[5,6] = a[6,5] = a[6,6] = a[7,5] = a[7,6] = 1    

# label it
labeled_array,numpatches = ndimage.label(a)

# For number of patches
closest_points = []
for patch in [x+1 for x in range(numpatches)]:
# Get coordinates of first patch
    x,y = numpy.where(labeled_array==patch)
    coords = numpy.vstack((x,y)).T # transform into array
    # Built a KDtree of the coords of the first patch
    mt = spatial.cKDTree(coords)

    for patch2 in [i+1 for i in range(numpatches)]:
        if patch == patch2: # If patch is the same as the first, skip
            continue
        # Get coordinates of second patch
        x2,y2 = numpy.where(labeled_array==patch2)
        coords2 = numpy.vstack((x2,y2)).T

        # Now loop through points
        min_res = []
        for pi in range(len(coords2)):
            dist, indexes = mt.query(coords2[pi]) # query the distance and index
            min_res.append([dist,pi])
        m = numpy.vstack(min_res)
        # Find minimum as closed point and get index of coordinates
        closest_points.append( coords2[m[numpy.argmin(m,axis=0)[0]][1]] )


# The average euclidean distance can then be calculated like this:
spatial.distance.pdist(closest_points,metric = "euclidean").mean()

编辑刚刚测试了@morningsun提出的解决方案，这是一个巨大的速度提升。但是，返回的值略有不同：

# Consider for instance the following array
a = numpy.zeros((8,8), dtype=numpy.int)
a[2,2] = a[2,6] = a[5,5] = 1  

labeled_array, numpatches = ndimage.label(cl_array,s)

# Previous approach using KDtrees and pdist
b = kd(labeled_array,numpatches)
spatial.distance.pdist(b,metric = "euclidean").mean()
#> 3.0413115592767102

# New approach using the lower matrix and selecting only lower distances
b = numpy.tril( feature_dist(labeled_array) )
b[b == 0 ] = numpy.nan
numpy.nanmean(b)
#> 3.8016394490958878

编辑2

啊，算了。spatial.distance.pdist没有返回正确的距离矩阵，因此值是错误的。

这里有一种完全矢量化的方法来查找标记对象的距离矩阵：

import numpy as np
from scipy.spatial.distance import cdist

def feature_dist(input):
    """
    Takes a labeled array as returned by scipy.ndimage.label and 
    returns an intra-feature distance matrix.
    """
    I, J = np.nonzero(input)
    labels = input[I,J]
    coords = np.column_stack((I,J))

    sorter = np.argsort(labels)
    labels = labels[sorter]
    coords = coords[sorter]

    sq_dists = cdist(coords, coords, 'sqeuclidean')

    start_idx = np.flatnonzero(np.r_[1, np.diff(labels)])
    nonzero_vs_feat = np.minimum.reduceat(sq_dists, start_idx, axis=1)
    feat_vs_feat = np.minimum.reduceat(nonzero_vs_feat, start_idx, axis=0)

    return np.sqrt(feat_vs_feat)

这种方法需要O（N2）内存，其中N是非零像素数。如果这要求太高，您可以沿一个轴“去矢量化”（添加一个for循环）。

谢谢！我刚刚在我的一个数据集上测试了它，它运行速度快了几乎89%。矢量化的力量。虽然我不完全理解为什么‘欧几里得’是计算出来的。如果尝试计算所有差异的平均值，它也会返回不同的值（请参见有问题的编辑）。啊，算出了（请参见上文）。Pdist没有返回正确的距离矩阵，因此我以前的值是错误的。。。再次感谢您的解决方案@Curlew-平方欧几里德公式计算速度更快。请注意，我仅将其用于中间结果；在return语句中取平方根。