Python 如何矢量化a'；对于'；在三维numpy数组上调用函数（以二维数组为参数）的循环_Python_Arrays_Numpy_Vectorization

Python 如何矢量化a'；对于'；在三维numpy数组上调用函数（以二维数组为参数）的循环

python arrays numpy

Python 如何矢量化a'；对于'；在三维numpy数组上调用函数（以二维数组为参数）的循环,python,arrays,numpy,vectorization,Python,Arrays,Numpy,Vectorization,我有一个numpy数组，其中包含点云中k邻域（k=10）点的XYZ坐标： k_neighboors Out[53]: array([[[ 2.51508147e-01, 5.60274944e-02, 1.98303187e+00], [ 2.48552352e-01, 5.95569573e-02, 1.98319519e+00], [ 2.56611764e-01, 5.36767729e-02, 1.98236740e+00]

我有一个numpy数组，其中包含点云中k邻域（k=10）点的XYZ坐标：

k_neighboors
Out[53]: 
array([[[  2.51508147e-01,   5.60274944e-02,   1.98303187e+00],
        [  2.48552352e-01,   5.95569573e-02,   1.98319519e+00],
        [  2.56611764e-01,   5.36767729e-02,   1.98236740e+00],
        ..., 
        [  2.54520357e-01,   6.23480231e-02,   1.98255634e+00],
        [  2.57603496e-01,   5.19787706e-02,   1.98221457e+00],
        [  2.43914440e-01,   5.68424985e-02,   1.98352253e+00]],

       [[  9.72352773e-02,   2.06699912e-02,   1.99344850e+00],
        [  9.91205871e-02,   2.36056261e-02,   1.99329960e+00],
        [  9.59625840e-02,   1.71508361e-02,   1.99356234e+00],
        ..., 
        [  1.03216261e-01,   2.19752081e-02,   1.99304521e+00],
        [  9.65025574e-02,   1.44127617e-02,   1.99355054e+00],
        [  9.59930867e-02,   2.72080526e-02,   1.99344873e+00]],

       [[  1.76408485e-01,   2.81930678e-02,   1.98819435e+00],
        [  1.78670138e-01,   2.81904750e-02,   1.98804617e+00],
        [  1.80372953e-01,   3.05109434e-02,   1.98791444e+00],
        ..., 
        [  1.81960404e-01,   2.47725621e-02,   1.98785996e+00],
        [  1.74499243e-01,   3.50728296e-02,   1.98826015e+00],
        [  1.83470801e-01,   2.70808022e-02,   1.98774099e+00]],

       ..., 
       [[  1.78178743e-01,  -4.60980982e-02,  -1.98792374e+00],
        [  1.77953839e-01,  -4.73701134e-02,  -1.98792756e+00],
        [  1.77889392e-01,  -4.75468598e-02,  -1.98793030e+00],
        ..., 
        [  1.79924294e-01,  -5.08776568e-02,  -1.98772371e+00],
        [  1.76720902e-01,  -5.11409082e-02,  -1.98791265e+00],
        [  1.83644593e-01,  -4.64747548e-02,  -1.98756230e+00]],

       [[  2.00245917e-01,  -2.33091787e-03,  -1.98685515e+00],
        [  2.02384919e-01,  -5.60011715e-04,  -1.98673022e+00],
        [  1.97325528e-01,  -1.03301927e-03,  -1.98705769e+00],
        ..., 
        [  1.95464164e-01,  -6.23105839e-03,  -1.98713481e+00],
        [  1.98985338e-01,  -8.39920342e-03,  -1.98688531e+00],
        [  1.95959195e-01,   2.68006674e-03,  -1.98713303e+00]],

       [[  1.28851235e-01,  -3.24527062e-02,  -1.99127460e+00],
        [  1.26415789e-01,  -3.27731185e-02,  -1.99143147e+00],
        [  1.25985757e-01,  -3.24910432e-02,  -1.99146211e+00],
        ..., 
        [  1.28296465e-01,  -3.92388329e-02,  -1.99117136e+00],
        [  1.34895295e-01,  -3.64872888e-02,  -1.99083793e+00],
        [  1.29047096e-01,  -3.97952795e-02,  -1.99111152e+00]]])

使用此形状：

k_neighboors.shape
Out[54]: (2999986, 10, 3)

我有一个函数，它将主成分分析应用于一些作为二维数组提供的数据：

def PCA(data, correlation=False, sort=True):
    """ Applies Principal Component Analysis to the data
    
    Parameters
    ----------        
    data: array
        The array containing the data. The array must have NxM dimensions, where each
        of the N rows represents a different individual record and each of the M columns
        represents a different variable recorded for that individual record.
            array([
            [V11, ... , V1m],
            ...,
            [Vn1, ... , Vnm]])
    
    correlation(Optional) : bool
            Set the type of matrix to be computed (see Notes):
                If True compute the correlation matrix.
                If False(Default) compute the covariance matrix. 
                
    sort(Optional) : bool
            Set the order that the eigenvalues/vectors will have
                If True(Default) they will be sorted (from higher value to less).
                If False they won't.   
    Returns
    -------
    eigenvalues: (1,M) array
        The eigenvalues of the corresponding matrix.
        
    eigenvector: (M,M) array
        The eigenvectors of the corresponding matrix.
    
    Notes
    -----
    The correlation matrix is a better choice when there are different magnitudes
    representing the M variables. Use covariance matrix in any other case.
    
    """
    
    #: get the mean of all variables
    mean = np.mean(data, axis=0, dtype=np.float64)
    
    #: adjust the data by substracting the mean to each variable
    data_adjust = data - mean
    
    #: compute the covariance/correlation matrix
    #: the data is transposed due to np.cov/corrcoef sintaxis
    if correlation:
        matrix = np.corrcoef(data_adjust.T)
    else:
        matrix = np.cov(data_adjust.T) 
    
    #: get the eigenvalues and eigenvectors
    eigenvalues, eigenvectors = np.linalg.eig(matrix)
    
    if sort:
        #: sort eigenvalues and eigenvectors
        sort = eigenvalues.argsort()[::-1]
        eigenvalues = eigenvalues[sort]
        eigenvectors = eigenvectors[:,sort]
    
    return eigenvalues, eigenvectors

因此，问题是：我如何在299986 10x3阵列中的每一个阵列上应用上面提到的PCA函数，而这种方法永远不会像下面这样：

data = np.empty((2999986, 3))

for i in range(len(k_neighboors)):
    w, v = PCA(k_neighboors[i])
    data[i] = v[:,2]
    break #:   I break the loop in order to don't have to wait for ever.


data
Out[64]: 
array([[ 0.10530792,  0.01028906,  0.99438643],
       [ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ],
       ..., 
       [ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ]])

感谢@Divakar和@Eelco的评论

使用divaker post的函数

根据Eelco在评论中指出的，我最终得出了这个结论

k_neighboors.shape
Out[48]: (2999986, 10, 3)

#: THE (ASSUMED)VECTORIZED ANSWER
data = np.linalg.eig(vectorized_app(k_neighboors))[1][:,:,2]

data
Out[50]: 
array([[ 0.10530792,  0.01028906,  0.99438643],
       [ 0.06462   ,  0.00944352,  0.99786526],
       [ 0.0654035 ,  0.00860751,  0.99782177],
       ..., 
       [-0.0632175 ,  0.01613551,  0.99786933],
       [-0.06449399,  0.00552943,  0.99790278],
       [-0.06081954,  0.01802078,  0.99798609]])

Wich给出了与for循环相同的结果，不会花费很长时间（尽管仍然需要一段时间）：

我不知道是否有更好的方法来解决这个问题，所以我将保留这个问题。

。非常接近，如果不是重复的话：。因此，将前面的链接与此混合使用应该会让您更接近（如果不是在家的话）。请注意，np.linalg.eig接受[…，m，n]作为输入形状；也就是说，您可以将此调用矢量化到所有子矩阵上。PCA函数中的所有步骤都是如此；每个都有一个矢量化的等价物；你只需要写出来：）Thaks对于@Dikavar的评论，我发现你的np.einsum函数非常棒，认为理解它的事实超过了我糟糕的编程知识。最后，我用你的函数（假设）回答了我自己的问题，但我不知道它是否正确。我想听听你对我的答案的看法。谢谢你指出@Eelco，我在我（假设）的答案中使用它。我想知道你对此的看法（答案）；我敢肯定这是你最好的选择。PCA从一开始就不是一个便宜的操作，所以我预计做300万次需要一些时间。

k_neighboors.shape
Out[48]: (2999986, 10, 3)

#: THE (ASSUMED)VECTORIZED ANSWER
data = np.linalg.eig(vectorized_app(k_neighboors))[1][:,:,2]

data
Out[50]: 
array([[ 0.10530792,  0.01028906,  0.99438643],
       [ 0.06462   ,  0.00944352,  0.99786526],
       [ 0.0654035 ,  0.00860751,  0.99782177],
       ..., 
       [-0.0632175 ,  0.01613551,  0.99786933],
       [-0.06449399,  0.00552943,  0.99790278],
       [-0.06081954,  0.01802078,  0.99798609]])

data2 = np.empty((2999986, 3))

for i in range(len(k_neighboors)):
    if i > 10:
        break #:   I break the loop in order to don't have to wait for ever.
    w, v = PCA(k_neighboors[i])
    data2[i] = v[:,2]


data2
Out[52]: 
array([[ 0.10530792,  0.01028906,  0.99438643],
       [ 0.06462   ,  0.00944352,  0.99786526],
       [ 0.0654035 ,  0.00860751,  0.99782177],
       ..., 
       [ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ]])