Python 矢量化最近邻计算

Python 矢量化最近邻计算,python,arrays,numpy,vector,vectorization,Python,Arrays,Numpy,Vector,Vectorization,我有以下函数,它返回一个数组,计算最近邻: def p_batch(U,X,Y): return [nearest(u,X,Y) for u in U] 我想用numpy替换for循环。我一直在研究numpy.vectorize(),因为这似乎是正确的方法,但我无法让它工作。这就是我迄今为止所尝试的: def n_batch(U,X,Y): vbatch = np.vectorize(nearest) return vbatch(U,X,Y) 谁能告诉我哪里出了错 编

我有以下函数,它返回一个数组,计算最近邻:

def p_batch(U,X,Y):
    return [nearest(u,X,Y) for u in U]
我想用numpy替换for循环。我一直在研究numpy.vectorize(),因为这似乎是正确的方法,但我无法让它工作。这就是我迄今为止所尝试的:

def n_batch(U,X,Y):
    vbatch = np.vectorize(nearest)
    return vbatch(U,X,Y)
谁能告诉我哪里出了错

编辑:

实施:

def nearest(u,X,Y):
    return Y[np.argmin(np.sqrt(np.sum(np.square(np.subtract(u,X)),axis=1)))]
U、X、Y的函数(M=20,N=100,d=50):

方法#1

您可以使用生成所有这些欧几里德距离,然后简单地使用
argmin
并索引到
Y
-

from scipy.spatial.distance import cdist

out = Y[cdist(U,X).argmin(1)]
样本运行-

In [76]: M,N,d = 5,6,3
    ...: U = np.random.mtrand.RandomState(123).uniform(0,1,[M,d])
    ...: X = np.random.mtrand.RandomState(456).uniform(0,1,[N,d])
    ...: Y = np.random.mtrand.RandomState(789).randint(0,2,[N])
    ...: 

# Using a loop comprehension to verify values
In [77]: [nearest(U[i], X,Y) for i in range(len(U))]
Out[77]: [1, 0, 0, 1, 1]

In [78]: Y[cdist(U,X).argmin(1)]
Out[78]: array([1, 0, 0, 1, 1])
方法#2

另一种方法是直接给我们那些
argmin
索引-

from sklearn.metrics import pairwise

Y[pairwise.pairwise_distances_argmin(U,X)]

运行时测试
M=20,N=100,d=50
-

In [90]: M,N,d = 20,100,50
    ...: U = np.random.mtrand.RandomState(123).uniform(0,1,[M,d])
    ...: X = np.random.mtrand.RandomState(456).uniform(0,1,[N,d])
    ...: Y = np.random.mtrand.RandomState(789).randint(0,2,[N])
    ...: 
cdist
成对距离\u argmin
-

In [91]: %timeit cdist(U,X).argmin(1)
10000 loops, best of 3: 55.2 µs per loop

In [92]: %timeit pairwise.pairwise_distances_argmin(U,X)
10000 loops, best of 3: 90.6 µs per loop
针对循环版本的计时-

In [93]: %timeit [nearest(U[i], X,Y) for i in range(len(U))]
1000 loops, best of 3: 298 µs per loop

In [94]: %timeit Y[cdist(U,X).argmin(1)]
10000 loops, best of 3: 55.6 µs per loop

In [95]: %timeit Y[pairwise.pairwise_distances_argmin(U,X)]
10000 loops, best of 3: 91.1 µs per loop

In [96]: 298.0/55.6   # Speedup with cdist over loopy one
Out[96]: 5.359712230215827

共享最近的
实现
,还是从某个库/包导入?此外,为了提高性能,您可以寻找除
np.vectorize
以外的其他方法。只有当您向我们展示
最近的
U
X
Y
np.vectorize
可能永远都不是正确的方法。我想你会更好,至少它在文档中这么说:“kd树用于快速近邻查找”@Divakar nearest实现添加了@阿雅姆·卡蒂:我已经在原来的问题中添加了它们!
In [93]: %timeit [nearest(U[i], X,Y) for i in range(len(U))]
1000 loops, best of 3: 298 µs per loop

In [94]: %timeit Y[cdist(U,X).argmin(1)]
10000 loops, best of 3: 55.6 µs per loop

In [95]: %timeit Y[pairwise.pairwise_distances_argmin(U,X)]
10000 loops, best of 3: 91.1 µs per loop

In [96]: 298.0/55.6   # Speedup with cdist over loopy one
Out[96]: 5.359712230215827