np.nonzero的Python/Cython/Numpy优化

np.nonzero的Python/Cython/Numpy优化,python,numpy,cython,Python,Numpy,Cython,我有一段我正在尝试优化的代码。大部分代码执行时间由cdef np.ndarray index=np.argwhere(array==1) 其中数组是一个numpy,是一个由0和1组成的512x512512 numpy数组。有没有想过要加快速度?使用Python 2.7、Numpy 1.8.1 球度函数 def sphericity(self,array): #Pass an mask array (1's are marked, 0's ignored) cdef np.nda

我有一段我正在尝试优化的代码。大部分代码执行时间由
cdef np.ndarray index=np.argwhere(array==1)
其中数组是一个numpy,是一个由0和1组成的512x512512 numpy数组。有没有想过要加快速度?使用Python 2.7、Numpy 1.8.1

球度函数

def sphericity(self,array):

    #Pass an mask array (1's are marked, 0's ignored)
    cdef np.ndarray index = np.argwhere(array==1)
    cdef int xSize,ySize,zSize
    xSize,ySize,zSize=array.shape

    cdef int sa,vol,voxelIndex,x,y,z,neighbors,xDiff,yDiff,zDiff,x1,y1,z1
    cdef float onethird,twothirds,sp
    sa=vol=0 #keep running tally of volume and surface area
    #cdef int nonZeroCount = (array != 0).sum() #Replaces np.count_nonzero(array) for speed
    for voxelIndex in range(np.count_nonzero(array)):
    #for voxelIndex in range(nonZeroCount):
        x=index[voxelIndex,0]
        y=index[voxelIndex,1]
        z=index[voxelIndex,2]
        #print x,y,z,array[x,y,z]
        neighbors=0
        vol+=1

        for xDiff in [-1,0,1]:
            for yDiff in [-1,0,1]:
                for zDiff in [-1,0,1]:
                    if abs(xDiff)+abs(yDiff)+abs(zDiff)==1:
                        x1=x+xDiff
                        y1=y+yDiff
                        z1=z+zDiff
                        if x1>=0 and y1>=0 and z1>=0 and x1<xSize and y1<ySize and z1<zSize:
                            #print '-',x1,y1,z1,array[x1,y1,z1]
                            if array[x1,y1,z1]:
                                #print '-',x1,y1,z1,array[x1,y1,z1]
                                neighbors+=1

        #print 'had this many neighbors',neighbors
        sa+=(6-neighbors)

    onethird=float(1)/float(3)
    twothirds=float(2)/float(3)
    sph = ((np.pi**onethird)*((6*vol)**twothirds)) / sa
    #print 'sphericity',sphericity
    return sph
分析输出

Mon Oct 06 11:49:57 2014    Profile.prof

         12 function calls in 4.373 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    3.045    3.045    4.373    4.373 <string>:1(<module>)
        1    1.025    1.025    1.025    1.025 {method 'nonzero' of 'numpy.ndarray' objects}
        2    0.302    0.151    0.302    0.151 {numpy.core.multiarray.array}
        1    0.001    0.001    1.328    1.328 numeric.py:731(argwhere)
        1    0.000    0.000    0.302    0.302 fromnumeric.py:492(transpose)
        1    0.000    0.000    0.302    0.302 fromnumeric.py:38(_wrapit)
        1    0.000    0.000    0.000    0.000 {method 'transpose' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.302    0.302 numeric.py:392(asarray)
        1    0.000    0.000    0.000    0.000 numeric.py:462(asanyarray)
        1    0.000    0.000    0.000    0.000 {getattr}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
2014年10月6日星期一11:49:57简介
4.373秒内调用12个函数
订购人:内部时间
ncalls tottime percall cumtime percall文件名:lineno(函数)
1    3.045    3.045    4.373    4.373 :1()
1 1.025 1.025 1.025 1.025{“numpy.ndarray”对象的方法“非零”}
2 0.302 0.151 0.302 0.151{numpy.core.multiarray.array}
1 0.001 0.001 1.328 1.328数字。py:731(argwhere)
1 0.000 0.000 0.302 0.302 fromnumeric.py:492(转置)
1 0.000 0.000 0.302 0.302 from numeric.py:38(_wrapit)
1 0.000 0.000 0.000 0.000{“numpy.ndarray”对象的方法“转置”}
1 0.000 0.000 0.302 0.302数字。py:392(asarray)
1 0.000 0.000 0.000 0.000数字。py:462(asanyarray)
1 0.000 0.000 0.000 0.000{getattr}
1 0.000 0.000 0.000 0.000{方法'disable'的''lsprof.Profiler'对象}

您可以从vanilla numpy中获得代码的大部分功能,而不需要Cython。最重要的是找到一种有效的计算邻域的方法,这可以通过
-ing从输入数组获得的掩码切片来完成。综上所述,我认为以下代码与您的代码相同,但重复性要小得多:

def sphericity(arr):
    mask = arr != 0
    vol = np.count_nonzero(mask)
    counts = np.zeros_like(arr, dtype=np.intp)
    for dim, size in enumerate(arr.shape):
        slc = (slice(None),) * dim
        axis_mask = (mask[slc + (slice(None, -1),)] &
                     mask[slc + (slice(1, None),)])
        counts[slc + (slice(None, -1),)] += axis_mask
        counts[slc + (slice(1, None),)] += axis_mask
    sa = np.sum(6 - counts[counts != 0])

    return np.pi**(1./3.)*(6*vol)**(2./3.) / sa

Jaime可能给出了一个很好的答案,但我将对Cython代码的改进进行评论,并添加一个性能比较

首先,您应该使用“注释”功能,
cython-a filename.pyx
,这将生成一个HTML文件。将其加载到浏览器中,它会用黄橙色突出显示“慢”行,这表示可以在哪里进行改进

Annotate立即揭示了两件容易修复的事情:

将习语转化为cython理解的东西 首先,这些线路很慢:

        for xDiff in [-1,0,1]:
            for yDiff in [-1,0,1]:
                for zDiff in [-1,0,1]:
原因是Cython不知道如何将列表迭代转换为干净的c代码。它需要转化为Cython可以优化的等效代码,即“范围内”形式:

        for xDiff in range(-1, 2):
            for yDiff in range(-1, 2):
                for zDiff in range(-1, 2):
用于快速索引的类型数组 接下来的事情是这条线很慢:

                            if array[x1,y1,z1]:
这是因为
数组
未指定类型。因此,它使用python级索引而不是c级索引。要解决此问题,您需要为数组指定一个类型,可以通过以下方式完成:

def sphericity(np.ndarray[np.uint8_t, ndim=3] array):
这是假设数组的类型为“uint8”,替换为适当的类型(注意:Cython不支持“np.bool”类型,因此我使用“uint8”)

也可以使用内存视图,不能在内存视图上使用numpy函数,但可以在阵列上创建视图,然后为视图而不是阵列编制索引:

    cdef np.uint8_t array_view [:, :, :] = array
    ...
                                    if array_view[x1,y1,z1]:
内存视图可能会稍微快一点,并在数组(python级调用)和视图(c级调用)之间进行了明确划分。如果不使用numpy函数,则可以使用内存视图,不会出现问题

重写代码以避免在数组上多次传递 剩下的是计算
索引
非零计数
都很慢,这是由于各种原因造成的,但主要与数据的绝对大小有关(本质上,迭代512*512*512个元素只需要时间!) 一般来说,Numpy可以做的任何事情,优化的Cython都可以做得更快(通常快2-10倍)——Numpy只需为您节省大量的重新发明和大量的打字,并让您在更高的层次上思考(如果您不是c程序员,您可能无法很好地优化Cython)。但在这种情况下,它很简单,您可以消除
索引
非零计数
以及所有相关代码,然后执行以下操作:

    for x in range(0, xSize):
        for y in range(0, ySize):
            for z in range(0, zSize):
                if array[x,y,z] == 0:
                    continue
                ... 
这是非常快的,因为c(干净的Cython编译成完美的)每秒执行数十亿次操作没有问题。通过消除
索引
非零计数
步骤,您基本上在整个阵列上节省了两次完整的迭代,即使在最高速度下,每次迭代也至少需要0.1秒。更重要的是CPU缓存,整个阵列为128mb,比CPU缓存大得多,因此在一个过程中完成所有操作可以更好地利用CPU缓存(如果阵列完全适合CPU缓存,则多个过程不会有太大关系)

优化版本 以下是我的优化版本的完整代码:

#cython: boundscheck=False, nonecheck=False, wraparound=False
import numpy as np
cimport numpy as np

def sphericity2(np.uint8_t [:, :, :] array):

    #Pass an mask array (1's are marked, 0's ignored)
    cdef int xSize,ySize,zSize
    xSize=array.shape[0]
    ySize=array.shape[1]
    zSize=array.shape[2]

    cdef int sa,vol,x,y,z,neighbors,xDiff,yDiff,zDiff,x1,y1,z1
    cdef float onethird,twothirds,sp
    sa=vol=0 #keep running tally of volume and surface area

    for x in range(0, xSize):
        for y in range(0, ySize):
            for z in range(0, zSize):
                if array[x,y,z] == 0:
                    continue

                neighbors=0
                vol+=1

                for xDiff in range(-1, 2):
                    for yDiff in range(-1, 2):
                        for zDiff in range(-1, 2):
                            if abs(xDiff)+abs(yDiff)+abs(zDiff)==1:
                                x1=x+xDiff
                                y1=y+yDiff
                                z1=z+zDiff
                                if x1>=0 and y1>=0 and z1>=0 and x1<xSize and y1<ySize and z1<zSize:
                                    #print '-',x1,y1,z1,array[x1,y1,z1]
                                    if array[x1,y1,z1]:
                                        #print '-',x1,y1,z1,array[x1,y1,z1]
                                        neighbors+=1

                #print 'had this many neighbors',neighbors
                sa+=(6-neighbors)

    onethird=float(1)/float(3)
    twothirds=float(2)/float(3)
    sph = ((np.pi**onethird)*((6*vol)**twothirds)) / sa
    #print 'sphericity',sphericity
    return sph
#cython:boundscheck=False,nonecheck=False,wrapparound=False
将numpy作为np导入
cimport numpy作为np
def球体2(np.uint8_t[:,:,:]数组):
#传递掩码数组(标记1,忽略0)
cdef int xSize、ySize、zSize
xSize=array.shape[0]
ySize=array.shape[1]
zSize=array.shape[2]
cdef int sa,vol,x,y,z,邻居,xDiff,yDiff,zDiff,x1,y1,z1
cdef浮动三分之一,三分之二,标准普尔
sa=vol=0#记录体积和表面积
对于范围(0,xSize)内的x:
对于范围(0,y)中的y:
对于范围(0,zSize)内的z:
如果数组[x,y,z]==0:
持续
邻居=0
体积+=1
对于范围(-1,2)内的xDiff:
对于范围(-1,2)内的yDiff:
对于范围(-1,2)内的zDiff:
如果abs(xDiff)+abs(yDiff)+abs(zDiff)==1:
x1=x+xDiff
y1=y+yDiff
z1=z+zDiff

如果x1>=0和y1>=0和z1>=0和X10,请说明您的开发环境(OS、Python版本、Numpy版本、C
#cython: boundscheck=False, nonecheck=False, wraparound=False
import numpy as np
cimport numpy as np

def sphericity2(np.uint8_t [:, :, :] array):

    #Pass an mask array (1's are marked, 0's ignored)
    cdef int xSize,ySize,zSize
    xSize=array.shape[0]
    ySize=array.shape[1]
    zSize=array.shape[2]

    cdef int sa,vol,x,y,z,neighbors,xDiff,yDiff,zDiff,x1,y1,z1
    cdef float onethird,twothirds,sp
    sa=vol=0 #keep running tally of volume and surface area

    for x in range(0, xSize):
        for y in range(0, ySize):
            for z in range(0, zSize):
                if array[x,y,z] == 0:
                    continue

                neighbors=0
                vol+=1

                for xDiff in range(-1, 2):
                    for yDiff in range(-1, 2):
                        for zDiff in range(-1, 2):
                            if abs(xDiff)+abs(yDiff)+abs(zDiff)==1:
                                x1=x+xDiff
                                y1=y+yDiff
                                z1=z+zDiff
                                if x1>=0 and y1>=0 and z1>=0 and x1<xSize and y1<ySize and z1<zSize:
                                    #print '-',x1,y1,z1,array[x1,y1,z1]
                                    if array[x1,y1,z1]:
                                        #print '-',x1,y1,z1,array[x1,y1,z1]
                                        neighbors+=1

                #print 'had this many neighbors',neighbors
                sa+=(6-neighbors)

    onethird=float(1)/float(3)
    twothirds=float(2)/float(3)
    sph = ((np.pi**onethird)*((6*vol)**twothirds)) / sa
    #print 'sphericity',sphericity
    return sph
Original : 2.123s Jaime's : 1.819s Optimized Cython : 0.136s @ moarningsun : 0.090s