Python 矢量化最近邻计算
我有以下函数,它返回一个数组,计算最近邻:Python 矢量化最近邻计算,python,arrays,numpy,vector,vectorization,Python,Arrays,Numpy,Vector,Vectorization,我有以下函数,它返回一个数组,计算最近邻: def p_batch(U,X,Y): return [nearest(u,X,Y) for u in U] 我想用numpy替换for循环。我一直在研究numpy.vectorize(),因为这似乎是正确的方法,但我无法让它工作。这就是我迄今为止所尝试的: def n_batch(U,X,Y): vbatch = np.vectorize(nearest) return vbatch(U,X,Y) 谁能告诉我哪里出了错 编
def p_batch(U,X,Y):
return [nearest(u,X,Y) for u in U]
我想用numpy替换for循环。我一直在研究numpy.vectorize(),因为这似乎是正确的方法,但我无法让它工作。这就是我迄今为止所尝试的:
def n_batch(U,X,Y):
vbatch = np.vectorize(nearest)
return vbatch(U,X,Y)
谁能告诉我哪里出了错
编辑:
实施:
def nearest(u,X,Y):
return Y[np.argmin(np.sqrt(np.sum(np.square(np.subtract(u,X)),axis=1)))]
U、X、Y的函数(M=20,N=100,d=50):
方法#1
您可以使用生成所有这些欧几里德距离,然后简单地使用argmin
并索引到Y
-
from scipy.spatial.distance import cdist
out = Y[cdist(U,X).argmin(1)]
样本运行-
In [76]: M,N,d = 5,6,3
...: U = np.random.mtrand.RandomState(123).uniform(0,1,[M,d])
...: X = np.random.mtrand.RandomState(456).uniform(0,1,[N,d])
...: Y = np.random.mtrand.RandomState(789).randint(0,2,[N])
...:
# Using a loop comprehension to verify values
In [77]: [nearest(U[i], X,Y) for i in range(len(U))]
Out[77]: [1, 0, 0, 1, 1]
In [78]: Y[cdist(U,X).argmin(1)]
Out[78]: array([1, 0, 0, 1, 1])
方法#2
另一种方法是直接给我们那些argmin
索引-
from sklearn.metrics import pairwise
Y[pairwise.pairwise_distances_argmin(U,X)]
运行时测试
M=20,N=100,d=50
-
In [90]: M,N,d = 20,100,50
...: U = np.random.mtrand.RandomState(123).uniform(0,1,[M,d])
...: X = np.random.mtrand.RandomState(456).uniform(0,1,[N,d])
...: Y = np.random.mtrand.RandomState(789).randint(0,2,[N])
...:
在cdist
和成对距离\u argmin
-
In [91]: %timeit cdist(U,X).argmin(1)
10000 loops, best of 3: 55.2 µs per loop
In [92]: %timeit pairwise.pairwise_distances_argmin(U,X)
10000 loops, best of 3: 90.6 µs per loop
针对循环版本的计时-
In [93]: %timeit [nearest(U[i], X,Y) for i in range(len(U))]
1000 loops, best of 3: 298 µs per loop
In [94]: %timeit Y[cdist(U,X).argmin(1)]
10000 loops, best of 3: 55.6 µs per loop
In [95]: %timeit Y[pairwise.pairwise_distances_argmin(U,X)]
10000 loops, best of 3: 91.1 µs per loop
In [96]: 298.0/55.6 # Speedup with cdist over loopy one
Out[96]: 5.359712230215827
共享最近的
实现
,还是从某个库/包导入?此外,为了提高性能,您可以寻找除np.vectorize
以外的其他方法。只有当您向我们展示最近的、U
、X
和Y
np.vectorize
可能永远都不是正确的方法。我想你会更好,至少它在文档中这么说:“kd树用于快速近邻查找”@Divakar nearest实现添加了@阿雅姆·卡蒂:我已经在原来的问题中添加了它们!
In [93]: %timeit [nearest(U[i], X,Y) for i in range(len(U))]
1000 loops, best of 3: 298 µs per loop
In [94]: %timeit Y[cdist(U,X).argmin(1)]
10000 loops, best of 3: 55.6 µs per loop
In [95]: %timeit Y[pairwise.pairwise_distances_argmin(U,X)]
10000 loops, best of 3: 91.1 µs per loop
In [96]: 298.0/55.6 # Speedup with cdist over loopy one
Out[96]: 5.359712230215827