Python 如何矢量化a';对于';在三维numpy数组上调用函数(以二维数组为参数)的循环
我有一个numpy数组,其中包含点云中k邻域(k=10)点的XYZ坐标:Python 如何矢量化a';对于';在三维numpy数组上调用函数(以二维数组为参数)的循环,python,arrays,numpy,vectorization,Python,Arrays,Numpy,Vectorization,我有一个numpy数组,其中包含点云中k邻域(k=10)点的XYZ坐标: k_neighboors Out[53]: array([[[ 2.51508147e-01, 5.60274944e-02, 1.98303187e+00], [ 2.48552352e-01, 5.95569573e-02, 1.98319519e+00], [ 2.56611764e-01, 5.36767729e-02, 1.98236740e+00]
k_neighboors
Out[53]:
array([[[ 2.51508147e-01, 5.60274944e-02, 1.98303187e+00],
[ 2.48552352e-01, 5.95569573e-02, 1.98319519e+00],
[ 2.56611764e-01, 5.36767729e-02, 1.98236740e+00],
...,
[ 2.54520357e-01, 6.23480231e-02, 1.98255634e+00],
[ 2.57603496e-01, 5.19787706e-02, 1.98221457e+00],
[ 2.43914440e-01, 5.68424985e-02, 1.98352253e+00]],
[[ 9.72352773e-02, 2.06699912e-02, 1.99344850e+00],
[ 9.91205871e-02, 2.36056261e-02, 1.99329960e+00],
[ 9.59625840e-02, 1.71508361e-02, 1.99356234e+00],
...,
[ 1.03216261e-01, 2.19752081e-02, 1.99304521e+00],
[ 9.65025574e-02, 1.44127617e-02, 1.99355054e+00],
[ 9.59930867e-02, 2.72080526e-02, 1.99344873e+00]],
[[ 1.76408485e-01, 2.81930678e-02, 1.98819435e+00],
[ 1.78670138e-01, 2.81904750e-02, 1.98804617e+00],
[ 1.80372953e-01, 3.05109434e-02, 1.98791444e+00],
...,
[ 1.81960404e-01, 2.47725621e-02, 1.98785996e+00],
[ 1.74499243e-01, 3.50728296e-02, 1.98826015e+00],
[ 1.83470801e-01, 2.70808022e-02, 1.98774099e+00]],
...,
[[ 1.78178743e-01, -4.60980982e-02, -1.98792374e+00],
[ 1.77953839e-01, -4.73701134e-02, -1.98792756e+00],
[ 1.77889392e-01, -4.75468598e-02, -1.98793030e+00],
...,
[ 1.79924294e-01, -5.08776568e-02, -1.98772371e+00],
[ 1.76720902e-01, -5.11409082e-02, -1.98791265e+00],
[ 1.83644593e-01, -4.64747548e-02, -1.98756230e+00]],
[[ 2.00245917e-01, -2.33091787e-03, -1.98685515e+00],
[ 2.02384919e-01, -5.60011715e-04, -1.98673022e+00],
[ 1.97325528e-01, -1.03301927e-03, -1.98705769e+00],
...,
[ 1.95464164e-01, -6.23105839e-03, -1.98713481e+00],
[ 1.98985338e-01, -8.39920342e-03, -1.98688531e+00],
[ 1.95959195e-01, 2.68006674e-03, -1.98713303e+00]],
[[ 1.28851235e-01, -3.24527062e-02, -1.99127460e+00],
[ 1.26415789e-01, -3.27731185e-02, -1.99143147e+00],
[ 1.25985757e-01, -3.24910432e-02, -1.99146211e+00],
...,
[ 1.28296465e-01, -3.92388329e-02, -1.99117136e+00],
[ 1.34895295e-01, -3.64872888e-02, -1.99083793e+00],
[ 1.29047096e-01, -3.97952795e-02, -1.99111152e+00]]])
使用此形状:
k_neighboors.shape
Out[54]: (2999986, 10, 3)
我有一个函数,它将主成分分析应用于一些作为二维数组提供的数据:
def PCA(data, correlation=False, sort=True):
""" Applies Principal Component Analysis to the data
Parameters
----------
data: array
The array containing the data. The array must have NxM dimensions, where each
of the N rows represents a different individual record and each of the M columns
represents a different variable recorded for that individual record.
array([
[V11, ... , V1m],
...,
[Vn1, ... , Vnm]])
correlation(Optional) : bool
Set the type of matrix to be computed (see Notes):
If True compute the correlation matrix.
If False(Default) compute the covariance matrix.
sort(Optional) : bool
Set the order that the eigenvalues/vectors will have
If True(Default) they will be sorted (from higher value to less).
If False they won't.
Returns
-------
eigenvalues: (1,M) array
The eigenvalues of the corresponding matrix.
eigenvector: (M,M) array
The eigenvectors of the corresponding matrix.
Notes
-----
The correlation matrix is a better choice when there are different magnitudes
representing the M variables. Use covariance matrix in any other case.
"""
#: get the mean of all variables
mean = np.mean(data, axis=0, dtype=np.float64)
#: adjust the data by substracting the mean to each variable
data_adjust = data - mean
#: compute the covariance/correlation matrix
#: the data is transposed due to np.cov/corrcoef sintaxis
if correlation:
matrix = np.corrcoef(data_adjust.T)
else:
matrix = np.cov(data_adjust.T)
#: get the eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(matrix)
if sort:
#: sort eigenvalues and eigenvectors
sort = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[sort]
eigenvectors = eigenvectors[:,sort]
return eigenvalues, eigenvectors
因此,问题是:我如何在299986 10x3阵列中的每一个阵列上应用上面提到的PCA函数,而这种方法永远不会像下面这样:
data = np.empty((2999986, 3))
for i in range(len(k_neighboors)):
w, v = PCA(k_neighboors[i])
data[i] = v[:,2]
break #: I break the loop in order to don't have to wait for ever.
data
Out[64]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]])
感谢@Divakar和@Eelco的评论 使用divaker post的函数 根据Eelco在评论中指出的,我最终得出了这个结论
k_neighboors.shape
Out[48]: (2999986, 10, 3)
#: THE (ASSUMED)VECTORIZED ANSWER
data = np.linalg.eig(vectorized_app(k_neighboors))[1][:,:,2]
data
Out[50]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0.06462 , 0.00944352, 0.99786526],
[ 0.0654035 , 0.00860751, 0.99782177],
...,
[-0.0632175 , 0.01613551, 0.99786933],
[-0.06449399, 0.00552943, 0.99790278],
[-0.06081954, 0.01802078, 0.99798609]])
Wich给出了与for循环相同的结果,不会花费很长时间(尽管仍然需要一段时间):
我不知道是否有更好的方法来解决这个问题,所以我将保留这个问题。。非常接近,如果不是重复的话:。因此,将前面的链接与此混合使用应该会让您更接近(如果不是在家的话)。请注意,np.linalg.eig接受[…,m,n]作为输入形状;也就是说,您可以将此调用矢量化到所有子矩阵上。PCA函数中的所有步骤都是如此;每个都有一个矢量化的等价物;你只需要写出来:)Thaks对于@Dikavar的评论,我发现你的np.einsum函数非常棒,认为理解它的事实超过了我糟糕的编程知识。最后,我用你的函数(假设)回答了我自己的问题,但我不知道它是否正确。我想听听你对我的答案的看法。谢谢你指出@Eelco,我在我(假设)的答案中使用它。我想知道你对此的看法(答案);我敢肯定这是你最好的选择。PCA从一开始就不是一个便宜的操作,所以我预计做300万次需要一些时间。
k_neighboors.shape
Out[48]: (2999986, 10, 3)
#: THE (ASSUMED)VECTORIZED ANSWER
data = np.linalg.eig(vectorized_app(k_neighboors))[1][:,:,2]
data
Out[50]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0.06462 , 0.00944352, 0.99786526],
[ 0.0654035 , 0.00860751, 0.99782177],
...,
[-0.0632175 , 0.01613551, 0.99786933],
[-0.06449399, 0.00552943, 0.99790278],
[-0.06081954, 0.01802078, 0.99798609]])
data2 = np.empty((2999986, 3))
for i in range(len(k_neighboors)):
if i > 10:
break #: I break the loop in order to don't have to wait for ever.
w, v = PCA(k_neighboors[i])
data2[i] = v[:,2]
data2
Out[52]:
array([[ 0.10530792, 0.01028906, 0.99438643],
[ 0.06462 , 0.00944352, 0.99786526],
[ 0.0654035 , 0.00860751, 0.99782177],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]])