使用numpy优化python函数，而不使用for循环_Python_Numpy

使用numpy优化python函数，而不使用for循环

python numpy

使用numpy优化python函数，而不使用for循环,python,numpy,Python,Numpy,我有以下python函数： def npnearest(u: np.ndarray, X: np.ndarray, Y: np.ndarray, distance: 'callbale'=npdistance): ''' Finds x1 so that x1 is in X and u and x1 have a minimal distance (according to the provided distance function) compared to all

我有以下python函数：

def npnearest(u: np.ndarray, X: np.ndarray, Y: np.ndarray, distance: 'callbale'=npdistance):
    '''
    Finds x1 so that x1 is in X and u and x1 have a minimal distance (according to the 
    provided distance function) compared to all other data points in X. Returns the label of x1

    Args:
        u (np.ndarray): The vector (ndim=1) we want to classify
        X (np.ndarray): A matrix (ndim=2) with training data points (vectors)
        Y (np.ndarray): A vector containing the label of each data point in X
        distance (callable): A function that receives two inputs and defines the distance function used

    Returns:
        int: The label of the data point which is closest to `u`
    '''

    xbest = None
    ybest = None
    dbest = float('inf')

    for x, y in zip(X, Y):
        d = distance(u, x)
        if d < dbest:
            ybest = y
            xbest = x
            dbest = d

    return ybest

我想通过直接在

numpy

中执行最近邻搜索来优化

npnearest

。这意味着函数不能使用

for/while

循环

谢谢

因为您不需要使用这个精确的函数，您只需更改总和即可在特定轴上工作。这将返回一个包含计算结果的新列表，您可以调用

argmin

获取最小值的索引。使用该选项并查找您的标签：

import numpy as np

def npdistance_idx(x1, x2):
    return np.argmin(np.sum((x1-x2)**2, axis=1))

Y = ["label 0", "label 1", "label 2", "label 3"]
u = np.array([[1, 5.5]])
X = np.array([[1,2], [1, 5], [0, 0], [7, 7]])

idx = npdistance_idx(X,  u)
print(Y[idx])  # label 1

Numpy支持向量化操作（）

这意味着您可以传入数组，操作将以优化的方式应用于整个数组（SIMD-单指令多数据）

然后可以使用

.argmin（）

希望这有帮助

[9]中的

：数字=np.arange（10）；数字
Out[9]：数组（[0,1,2,3,4,5,6,7,8,9]）
[10]中：数字-=5；数字
Out[10]：数组（[-5，-4，-3，-2，-1,0,1,2,3,4]）
在[11]中：数=np.幂（数，2）；数字
Out[11]：数组（[25,16,9,4,1,0,1,4,9,16]）
[12]中：number.argmin（）
Out[12]：5

您必须使用

npdistance

函数吗？不，我不需要真的感谢您的回答，它可以工作，但不知何故它比python版本慢？这是时间图。这不应该发生，对吧？@BlueMango不知道你是如何得到这些结果的。当我用300对X测试它时，numpy版本是

20.8µs±543 ns/循环

2.47 ms±25.8µs/循环

。差异只会随着阵列的增长而增加。

import numpy as np

def npdistance_idx(x1, x2):
    return np.argmin(np.sum((x1-x2)**2, axis=1))

Y = ["label 0", "label 1", "label 2", "label 3"]
u = np.array([[1, 5.5]])
X = np.array([[1,2], [1, 5], [0, 0], [7, 7]])

idx = npdistance_idx(X,  u)
print(Y[idx])  # label 1