使用numpy优化python函数,而不使用for循环
我有以下python函数:使用numpy优化python函数,而不使用for循环,python,numpy,Python,Numpy,我有以下python函数: def npnearest(u: np.ndarray, X: np.ndarray, Y: np.ndarray, distance: 'callbale'=npdistance): ''' Finds x1 so that x1 is in X and u and x1 have a minimal distance (according to the provided distance function) compared to all
def npnearest(u: np.ndarray, X: np.ndarray, Y: np.ndarray, distance: 'callbale'=npdistance):
'''
Finds x1 so that x1 is in X and u and x1 have a minimal distance (according to the
provided distance function) compared to all other data points in X. Returns the label of x1
Args:
u (np.ndarray): The vector (ndim=1) we want to classify
X (np.ndarray): A matrix (ndim=2) with training data points (vectors)
Y (np.ndarray): A vector containing the label of each data point in X
distance (callable): A function that receives two inputs and defines the distance function used
Returns:
int: The label of the data point which is closest to `u`
'''
xbest = None
ybest = None
dbest = float('inf')
for x, y in zip(X, Y):
d = distance(u, x)
if d < dbest:
ybest = y
xbest = x
dbest = d
return ybest
我想通过直接在numpy
中执行最近邻搜索来优化npnearest
。这意味着函数不能使用for/while
循环
谢谢因为您不需要使用这个精确的函数,您只需更改总和即可在特定轴上工作。这将返回一个包含计算结果的新列表,您可以调用
argmin
获取最小值的索引。使用该选项并查找您的标签:
import numpy as np
def npdistance_idx(x1, x2):
return np.argmin(np.sum((x1-x2)**2, axis=1))
Y = ["label 0", "label 1", "label 2", "label 3"]
u = np.array([[1, 5.5]])
X = np.array([[1,2], [1, 5], [0, 0], [7, 7]])
idx = npdistance_idx(X, u)
print(Y[idx]) # label 1
Numpy支持向量化操作() 这意味着您可以传入数组,操作将以优化的方式应用于整个数组(SIMD-单指令多数据) 然后可以使用
.argmin()
希望这有帮助
[9]中的:数字=np.arange(10);数字
Out[9]:数组([0,1,2,3,4,5,6,7,8,9])
[10]中:数字-=5;数字
Out[10]:数组([-5,-4,-3,-2,-1,0,1,2,3,4])
在[11]中:数=np.幂(数,2);数字
Out[11]:数组([25,16,9,4,1,0,1,4,9,16])
[12]中:number.argmin()
Out[12]:5
您必须使用npdistance
函数吗?不,我不需要真的感谢您的回答,它可以工作,但不知何故它比python版本慢?这是时间图。这不应该发生,对吧?@BlueMango不知道你是如何得到这些结果的。当我用300对X测试它时,numpy版本是20.8µs±543 ns/循环
vs2.47 ms±25.8µs/循环
。差异只会随着阵列的增长而增加。
import numpy as np
def npdistance_idx(x1, x2):
return np.argmin(np.sum((x1-x2)**2, axis=1))
Y = ["label 0", "label 1", "label 2", "label 3"]
u = np.array([[1, 5.5]])
X = np.array([[1,2], [1, 5], [0, 0], [7, 7]])
idx = npdistance_idx(X, u)
print(Y[idx]) # label 1