Python 数组和数组列之间的最小欧氏距离和对应索引_Python_Performance_Numpy_Euclidean Distance

Python 数组和数组列之间的最小欧氏距离和对应索引

python performance numpy

Python 数组和数组列之间的最小欧氏距离和对应索引,python,performance,numpy,euclidean-distance,Python,Performance,Numpy,Euclidean Distance,更新以防这对任何人都有用。如下所述，欧几里德距离计算和np.argmin几乎占据了整个运行时间。通过使用numba重新编写距离计算，与已经很快的np.einsum相比，在大多数情况下，我至少可以减少20% @jit(nopython=True) def calculateDistances_numba(currentLocation, traces) deltaX = traces[:, 0, :] - currentLocation[0] deltaY = traces[:

更新

以防这对任何人都有用。如下所述，欧几里德距离计算和

np.argmin

几乎占据了整个运行时间。通过使用

numba

重新编写距离计算，与已经很快的

np.einsum

相比，在大多数情况下，我至少可以减少20%

@jit(nopython=True)
def calculateDistances_numba(currentLocation, traces)

    deltaX = traces[:, 0, :] - currentLocation[0]
    deltaY = traces[:, 1, :] - currentLocation[1]
    deltaZ = traces[:, 2, :] - currentLocation[2]

    distances = (deltaX**2 + deltaY**2 + deltaZ**2)*0.5

    return distances

~~~~~

问题

我有一个大数组，

vertices.shape=（N，3）；N~5e6

，描述非结构化网格的3D顶点。我有

更小的坐标和数据数组，我想线性插值到

顶点上。它们沿另一个数组的第三轴存储，traces.shape=（L，3，n）；L~2e4；n~2e3
。对于每个顶点（顶点中的行
），我想快速找到两个最近的点，它们来自不同的小数组（跟踪中的页面
，即它们沿轴=2的索引不同）。所谓最近距离，我指的是欧几里德距离d=（deltaX**2+deltaY**2+deltaZ**2）
。此函数的目的是在两个已知值之间对顶点中的点进行线性插值
我当前的函数运行得相当好，但是对于上面给出的预期阵列大小（8个小时以上），速度变得非常慢。我已经看完了我的全部代码，可以肯定地说这个计算是昂贵的
当前功能
import numpy as np

def interpolate(currentLocation, traces, nTraces):
    # Calculate the Euclidean distance between currentLocation and all points
    # in the search bracket. Einsum was found to be faster than np.linalg.norm as well as 
    # standard numpy operations.
    # Distances is a 2D array of shape (L, n) and corresponds to the Euclidean distance
    # between currentLocation and every point in traces.
    deltas = traces - currentLocation[None, :, None]
    distances = np.einsum('ijk,ijk->ik', deltas, deltas)**0.5

    # Along axis = 1 is definitely a little bit faster 
    # but haven't implemented.
    # rowIndices is a 1D array whose elements are the indices of the 
    # smallest distance for each page (small array) of traces.
    rowIndices = np.argmin(distances, axis=0)

    # Get the actual distances
    min_distances = distances[rowIndices, np.arange(nTraces)]

    # Indices of two smallest traces (pages)
    columnIndices = np.argpartition(min_distances, 2)[:2]

    # Row indices of the two closest points
    rowIndices = rowIndices[columnIndices]

    # Distances to two closest points
    closePoints_distances = min_distances[columnIndices]

    # Calculate the interpolant weights based on the distances
    interpolantWeights = closePoints_distances/np.sum(closePoints_distances)

    # Return the indices because I need to retrieve the data for the close points
    # Return the interpolant weights to interpolate the data once retrieved
    return rowIndices, columnIndices, interpolantWeights

vertices = np.random.rand(200, 3)
traces = np.random.rand(100, 3, 10)
nTraces = traces.shape[-1]

# This is a simplified version of what actually happens.
for index, currentLocation in enumerate(np.arange(vertices.shape[0])):
    interpolate(currentLocation, traces, nTraces)

%timeit输出
%timeit interpolater(currentLocation, streamlineBlock, nStreamlines)
10 loops, best of 3: 42.8 ms per loop

由于数据的结构，我可以只选择一块跟踪
进行搜索（L~2e3），这显著减少了运行时间。要搜索的括号是currentLocation的函数
%timeit interpolaterNew(...)
100 loops, best of 3: 6.27 ms per loop

cProfile输出
%timeit interpolater(currentLocation, streamlineBlock, nStreamlines)
10 loops, best of 3: 42.8 ms per loop

cProfile告诉我np.einsum和np.argmin是最慢的-事实上它们是计算的绝大多数。请注意，这是针对我的代码的一小部分数据，因此可能无法准确反映上述函数
4251460 function calls (4151427 primitive calls) in 17.907 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
50012    7.236    0.000   17.063    0.000 vc3d_traces.py:554(interpolateToVertices)
50012    3.940    0.000    3.940    0.000 {built-in method numpy.core.multiarray.c_einsum}
50012    3.505    0.000    3.505    0.000 {method 'argmin' of 'numpy.ndarray' objects}
100025    0.291    0.000    0.291    0.000 {method 'reduce' of 'numpy.ufunc' objects}
50012    0.289    0.000   17.352    0.000 frame.py:4238(f)
50012    0.223    0.000    0.223    0.000 {method 'searchsorted' of 'numpy.ndarray' objects}
100024    0.191    0.000    0.346    0.000 indexing.py:1815(_convert_key)
1    0.190    0.190   17.905   17.905 {pandas._libs.lib.reduce}
100024    0.159    0.000    0.504    0.000 fromnumeric.py:1730(sum)
100024    0.155    0.000    0.155    0.000 {method 'get_value' of 'pandas._libs.index.IndexEngine' objects}

问题
对于如何提高性能，我现在有点不知所措。鉴于距离计算和argmin排序是最昂贵的，是否可以将这些步骤“矢量化”，将计算应用于整个“顶点”数组？我确实尝试过将其广播到axis=4，但没有成功——计算机冻结了。cProfile报告是否指向其他方面，或者我的代码中是否存在明显的错误？有人能告诉我一个更好的方法吗？最后，使用TQM，每秒的迭代次数会大幅快速减少（250次下降，让你研究使用kdtree@BiRico我没有将kdtree应用于此问题，原因有两个1）我认为我需要为每个小数组（n
trees）创建一棵树，以便从不同的数组返回两个点。2） 我不认为我会在不重新训练树木的情况下利用切割出的大块痕迹。但是我会检查它，然后退出sklearn.neights.KDtree
，谢谢。更新：由于上面指定的原因，kdtrees似乎不是解决此特定问题的好选项。除非只能训练和查询一棵树，这棵树将从单独的小数组返回两个点。