Python中的向量化平均K近邻距离
这是Rn中点的K近邻算法,应计算每个点到其K近邻的平均距离。问题是,虽然它是矢量化的,但从我重复自己的意义上讲,它是低效的。如果有人能帮助我改进此代码,我将非常高兴:Python中的向量化平均K近邻距离,python,numpy,vectorization,Python,Numpy,Vectorization,这是Rn中点的K近邻算法,应计算每个点到其K近邻的平均距离。问题是,虽然它是矢量化的,但从我重复自己的意义上讲,它是低效的。如果有人能帮助我改进此代码,我将非常高兴: import numpy as np from scipy.spatial.distance import pdist from scipy.spatial.distance import squareform def nn_args_R_n_squared(points): """Calculate pairwise
import numpy as np
from scipy.spatial.distance import pdist
from scipy.spatial.distance import squareform
def nn_args_R_n_squared(points):
"""Calculate pairwise distances of points and return the matrix together with matrix of indices of the first matrix sorted"""
dist_mat=squareform(pdist(points,'sqeuclidean'))
return dist_mat,np.argsort(dist_mat,axis=1)
def knn_avg_dist(X,k):
"""Calculates for points in rows of X, the average distance of each, to their k-nearest neighbours"""
X_dist_mat,X_sorted_arg=nn_args_R_n_squared(X)
X_matrices=(X[X_sorted_arg[:,1:k+1]]-X[...,None,...]).astype(np.float64)
return np.mean(np.linalg.norm(X_matrices,axis=2)**2,axis=1)
X=np.random.randn(30).reshape((10,3))
print X
print knn_avg_dist(X,3)
输出:
[[-1.87979713 0.02832699 0.18654558]
[ 0.95626677 0.4415187 -0.90220505]
[ 0.86210012 -0.88348927 0.32462922]
[ 0.42857316 1.66556448 -0.31829065]
[ 0.26475478 -1.6807253 -1.37694585]
[-0.08882175 -0.61925033 -1.77264525]
[-0.24085553 0.64426394 -0.01973027]
[-0.86926425 0.93439913 -0.31657442]
[-0.30987468 0.02925649 -1.38556347]
[-0.41801804 1.40210993 -1.04450895]]
[ 3.37983833 2.1257945 3.60884158 1.67051682 2.85013297 1.66756279
1.2678029 1.20491026 1.54623574 1.30722388]
如您所见,我计算了两次距离,但我无法想出从X\u dist\u mat
读取相同信息的方法,因为我必须同时从每行读取多个元素。使用:
如果您在代码中添加了
import
s并生成了虚拟数据,那么您可以复制并粘贴它来查看。否则,您应该能够从sklearn
中的现有实现中获得灵感,谢谢!你在Python世界里摇滚!:)
>>> data = np.random.rand(1000, 3)
>>> import scipy.spatial
>>> kdt = scipy.spatial.cKDTree(data)
>>> k = 5 # number of nearest neighbors
>>> dists, neighs = kdt.query(data, k+1)
>>> avg_dists = np.mean(dists[:, 1:], axis=1)