Numpy 使我的cython代码更高效

Numpy 使我的cython代码更高效,numpy,cython,cythonize,Numpy,Cython,Cythonize,我已经编写了一个python程序,我试图对其进行cythonize。 有没有建议如何使for循环更有效,因为这需要99%的时间 这是for循环: for i in range(l): b1[i] = np.nanargmin(locator[i,:]) # Closer point locator[i, b1[i]] = NAN # Do not consider Closer point b2[i] = np.nanargmin(loca

我已经编写了一个python程序,我试图对其进行cythonize。 有没有建议如何使for循环更有效,因为这需要99%的时间

这是for循环:

    for i in range(l):
        b1[i] = np.nanargmin(locator[i,:]) # Closer point
        locator[i, b1[i]] = NAN # Do not consider Closer point
        b2[i] = np.nanargmin(locator[i,:]) # 2nd Closer point
        Adjacents[i,0] = np.array((Existed_Pips[b1[i]]), dtype=np.double)
        Adjacents[i,1] = np.array((Existed_Pips[b2[i]]), dtype=np.double)
以下是代码的其余部分:

import numpy as np
cimport numpy as np
from libc.math cimport NAN #, isnan

def PIPs(np.ndarray[np.double_t, ndim=1, mode='c'] ys, unsigned int nofPIPs, unsigned int typeofdist):
    cdef:
        unsigned int currentstate, j, i
        np.ndarray[np.double_t, ndim=1, mode="c"] D
        np.ndarray[np.int64_t, ndim=1, mode="c"] Existed_Pips
        np.ndarray[np.int_t, ndim=1, mode="c"] xs
        np.ndarray[np.double_t, ndim=2] Adjacents, locator, Adjy, Adjx, Raw_Fire_PIPs, Raw_Fem_PIPs
        np.ndarray[np.int_t, ndim=2, mode="c"] PIP_points, b1, b2

    cdef unsigned int l = len(ys)
    xs = np.arange(0,l, dtype=np.int) # Column vector with xs
    PIP_points = np.zeros((l,1), dtype=np.int) # Binary indexation
    PIP_points[0] = 1 # One indicate the PIP points.The first two PIPs are the first and the last observation.
    PIP_points[-1] = 1
    Adjacents = np.zeros((l,2), dtype=np.double)
    currentstate = 2 # Initial PIPs

    while currentstate <= nofPIPs: #    for eachPIPs in range(nofPIPs)
        Existed_Pips = np.flatnonzero(PIP_points)
        currentstate = len(Existed_Pips)
        locator = np.full((l,currentstate), NAN, dtype=np.double) #np.int*
        for j in range(currentstate):
            locator[:,j] = np.absolute(xs-Existed_Pips[j])
        b1 = np.zeros((l,1), dtype=np.int)
        b2 = np.zeros((l,1), dtype=np.int)
        for i in range(l):
            b1[i] = np.nanargmin(locator[i,:]) # Closer point
            locator[i, b1[i]] = NAN # Do not consider Closer point
            b2[i] = np.nanargmin(locator[i,:]) # 2nd Closer point
            Adjacents[i,0] = np.array((Existed_Pips[b1[i]]), dtype=np.double)
            Adjacents[i,1] = np.array((Existed_Pips[b2[i]]), dtype=np.double)

        ##Calculate Distance
        Adjx = Adjacents        
        Adjy = np.array([ys[np.array(Adjacents[:,0], dtype=np.int)], ys[np.array(Adjacents[:,1], dtype=np.int)]]).transpose()
        Adjx[Existed_Pips,:] = NAN # Existed PIPs are not candidates for new PIP.
        Adjy[Existed_Pips,:] = NAN

        if typeofdist == 1: #Euclidean Distance
            ##[D] = EDist(ys,xs,Adjx,Adjy)
            ED = np.power(np.power((Adjx[:,1]-xs),2) + np.power((Adjy[:,1]-ys),2),(0.5)) + np.power(np.power((Adjx[:,0]-xs),2) + np.power((Adjy[:,0]-ys),2),(0.5))

        EDmax = np.nanargmax(ED)
        PIP_points[EDmax]=1

        currentstate=currentstate+1

    return np.array([Existed_Pips, ys[Existed_Pips]]).transpose()
将numpy导入为np
cimport numpy作为np
来自libc.math cimport NAN#,isnan
def PIP(np.ndarray[np.double_t,ndim=1,mode='c']ys,无符号int-nofPIPs,无符号int-typeofdist):
cdef:
无符号int-currentstate,j,i
np.ndarray[np.double\u t,ndim=1,mode=“c”]D
np.ndarray[np.int64\u t,ndim=1,mode=“c”]存在\u pip
np.ndarray[np.int_t,ndim=1,mode=“c”]xs
np.ndarray[np.double\u t,ndim=2]邻接,定位器,邻接,邻接,原始火灾点,原始有限元点
np.ndarray[np.int\u t,ndim=2,mode=“c”]PIP\u点,b1,b2
cdef无符号整数l=len(ys)
xs=np.arange(0,l,dtype=np.int)#带xs的列向量
PIP_points=np.zeros((l,1),dtype=np.int)#二进制指数化
PIP_点[0]=1#1表示PIP点。前两个PIP是第一个和最后一个观察点。
点[-1]=1
邻接项=np.零((l,2),数据类型=np.双)
当前状态=2#初始点

而currentstate则提出了几点建议:

  • 将对
    np.nanargmin
    的调用从循环中取出(使用
    axis
    参数可以一次对整个数组进行操作。这减少了必须进行的Python函数调用的数量:

    b1 = np.nanargmin(locator,axis=1)
    locator[np.arange(locator.shape[0]),b1] = np.nan
    b2 = np.nanargmin(locator,axis=1)
    
  • 您对
    邻接项的赋值很奇怪-您似乎要先为右侧创建一个长度为1的数组。只需这样做即可

    Adjacents[i,0] = Existed_Pips[b1[i]]
    # ...
    
    但是,在这种情况下,您也可以将两条线置于循环之外,从而消除整个循环:

    Adjacents = np.vstack((Existing_Pips[b1], Existings_Pips[b2])).T
    

  • 所有这些都依赖于numpy而不是Cython来提高速度,但它可能比您的版本要好。

    非常感谢!在一些大型数据集中多次运行时,运行时间从55秒以上增加到5.8秒,因此效果非常好!这比我预期的要好得多。我想这一定很好。