Python 基于numpy的k近邻分类器_Python_Numpy_Knn

Python 基于numpy的k近邻分类器

python numpy

Python 基于numpy的k近邻分类器,python,numpy,knn,Python,Numpy,Knn,我正在尝试实现我自己的kNN分类器。我已经设法实现了一些东西，但是速度太慢了 def euclidean_distance(X_train, X_test): """ Create list of all euclidean distances between the given feature vector and all other feature vectors in the training set """ return [np.linalg.no

我正在尝试实现我自己的kNN分类器。我已经设法实现了一些东西，但是速度太慢了

def euclidean_distance(X_train, X_test):
    """
    Create list of all euclidean distances between the given
    feature vector and all other feature vectors in the training set
    """
    return [np.linalg.norm(X - X_test) for X in X_train]

def k_nearest(X, Y, k):
    """
    Get the indices of the nearest feature vectors and return a
    list of their classes
    """
    idx = np.argpartition(X, k)
    return np.take(Y, idx[:k])

def predict(X_test):
    """
    For each feature vector get its predicted class
    """
    distance_list = [euclidean_distance(X_train, X) for X in X_test]
    return np.array([Counter(k_nearest(distances, Y_train, k)).most_common()[0][0] for distances in distance_list])

何处（例如）

显然，如果我不使用for循环，速度会快得多，但是如果没有它们，我不知道如何使它工作。有没有一种方法可以在不使用循环/列表理解的情况下实现这一点？

这里有一种矢量化方法-

from scipy.spatial.distance import cdist
from scipy.stats import mode

dists = cdist(X_train, X)
idx = np.argpartition(dists, k, axis=0)[:k]
nearest_dists = np.take(Y_train, idx)
out = mode(nearest_dists,axis=0)[0]

什么是

X\u train

？@Divakar您将

分为一个训练集和测试集。假设

实际上是200行

X，y

值，而不是2行。然后将其分为

X_-train

和

X_-test

。我也设法使用

spatial.KDTree

来实现它，而且速度肯定更快，但在尝试此操作时仍然需要40秒（之前是240秒）。我无法理解

sklearn

如何在0.7秒内完成此操作@用户5368737我不知道它的内部结构。但是如果我不得不猜测的话，我会说这可能不是计算所有的距离，然后抛出除了最近的

个之外的所有距离，就像我们在这里做的那样。但是，是的，我看到与任何Python/Numpy实现相比，

kDtree

的速度都非常快。@user5368737只是好奇-您是否更改了提议的代码配置文件，并查看了哪个步骤在更大的数据集上花费的时间最多？我没有使用您的代码，而是使用了我自己的

spatial.kDtree

，这就是我所做的查询（

X=Y\u train[tree.query（X\u test，k=k）[1]]

），这需要很长时间，因为

X\u train.shape=（268288，2）

。我不知道如何让这个更快不幸的是。。。

from scipy.spatial.distance import cdist
from scipy.stats import mode

dists = cdist(X_train, X)
idx = np.argpartition(dists, k, axis=0)[:k]
nearest_dists = np.take(Y_train, idx)
out = mode(nearest_dists,axis=0)[0]