np.nonzero的Python/Cython/Numpy优化
我有一段我正在尝试优化的代码。大部分代码执行时间由np.nonzero的Python/Cython/Numpy优化,python,numpy,cython,Python,Numpy,Cython,我有一段我正在尝试优化的代码。大部分代码执行时间由cdef np.ndarray index=np.argwhere(array==1) 其中数组是一个numpy,是一个由0和1组成的512x512512 numpy数组。有没有想过要加快速度?使用Python 2.7、Numpy 1.8.1 球度函数 def sphericity(self,array): #Pass an mask array (1's are marked, 0's ignored) cdef np.nda
cdef np.ndarray index=np.argwhere(array==1)
其中数组是一个numpy,是一个由0和1组成的512x512512 numpy数组。有没有想过要加快速度?使用Python 2.7、Numpy 1.8.1
球度函数
def sphericity(self,array):
#Pass an mask array (1's are marked, 0's ignored)
cdef np.ndarray index = np.argwhere(array==1)
cdef int xSize,ySize,zSize
xSize,ySize,zSize=array.shape
cdef int sa,vol,voxelIndex,x,y,z,neighbors,xDiff,yDiff,zDiff,x1,y1,z1
cdef float onethird,twothirds,sp
sa=vol=0 #keep running tally of volume and surface area
#cdef int nonZeroCount = (array != 0).sum() #Replaces np.count_nonzero(array) for speed
for voxelIndex in range(np.count_nonzero(array)):
#for voxelIndex in range(nonZeroCount):
x=index[voxelIndex,0]
y=index[voxelIndex,1]
z=index[voxelIndex,2]
#print x,y,z,array[x,y,z]
neighbors=0
vol+=1
for xDiff in [-1,0,1]:
for yDiff in [-1,0,1]:
for zDiff in [-1,0,1]:
if abs(xDiff)+abs(yDiff)+abs(zDiff)==1:
x1=x+xDiff
y1=y+yDiff
z1=z+zDiff
if x1>=0 and y1>=0 and z1>=0 and x1<xSize and y1<ySize and z1<zSize:
#print '-',x1,y1,z1,array[x1,y1,z1]
if array[x1,y1,z1]:
#print '-',x1,y1,z1,array[x1,y1,z1]
neighbors+=1
#print 'had this many neighbors',neighbors
sa+=(6-neighbors)
onethird=float(1)/float(3)
twothirds=float(2)/float(3)
sph = ((np.pi**onethird)*((6*vol)**twothirds)) / sa
#print 'sphericity',sphericity
return sph
分析输出
Mon Oct 06 11:49:57 2014 Profile.prof
12 function calls in 4.373 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 3.045 3.045 4.373 4.373 <string>:1(<module>)
1 1.025 1.025 1.025 1.025 {method 'nonzero' of 'numpy.ndarray' objects}
2 0.302 0.151 0.302 0.151 {numpy.core.multiarray.array}
1 0.001 0.001 1.328 1.328 numeric.py:731(argwhere)
1 0.000 0.000 0.302 0.302 fromnumeric.py:492(transpose)
1 0.000 0.000 0.302 0.302 fromnumeric.py:38(_wrapit)
1 0.000 0.000 0.000 0.000 {method 'transpose' of 'numpy.ndarray' objects}
1 0.000 0.000 0.302 0.302 numeric.py:392(asarray)
1 0.000 0.000 0.000 0.000 numeric.py:462(asanyarray)
1 0.000 0.000 0.000 0.000 {getattr}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2014年10月6日星期一11:49:57简介
4.373秒内调用12个函数
订购人:内部时间
ncalls tottime percall cumtime percall文件名:lineno(函数)
1 3.045 3.045 4.373 4.373 :1()
1 1.025 1.025 1.025 1.025{“numpy.ndarray”对象的方法“非零”}
2 0.302 0.151 0.302 0.151{numpy.core.multiarray.array}
1 0.001 0.001 1.328 1.328数字。py:731(argwhere)
1 0.000 0.000 0.302 0.302 fromnumeric.py:492(转置)
1 0.000 0.000 0.302 0.302 from numeric.py:38(_wrapit)
1 0.000 0.000 0.000 0.000{“numpy.ndarray”对象的方法“转置”}
1 0.000 0.000 0.302 0.302数字。py:392(asarray)
1 0.000 0.000 0.000 0.000数字。py:462(asanyarray)
1 0.000 0.000 0.000 0.000{getattr}
1 0.000 0.000 0.000 0.000{方法'disable'的''lsprof.Profiler'对象}
您可以从vanilla numpy中获得代码的大部分功能,而不需要Cython。最重要的是找到一种有效的计算邻域的方法,这可以通过和-ing从输入数组获得的掩码切片来完成。综上所述,我认为以下代码与您的代码相同,但重复性要小得多:
def sphericity(arr):
mask = arr != 0
vol = np.count_nonzero(mask)
counts = np.zeros_like(arr, dtype=np.intp)
for dim, size in enumerate(arr.shape):
slc = (slice(None),) * dim
axis_mask = (mask[slc + (slice(None, -1),)] &
mask[slc + (slice(1, None),)])
counts[slc + (slice(None, -1),)] += axis_mask
counts[slc + (slice(1, None),)] += axis_mask
sa = np.sum(6 - counts[counts != 0])
return np.pi**(1./3.)*(6*vol)**(2./3.) / sa
Jaime可能给出了一个很好的答案,但我将对Cython代码的改进进行评论,并添加一个性能比较
首先,您应该使用“注释”功能,cython-a filename.pyx
,这将生成一个HTML文件。将其加载到浏览器中,它会用黄橙色突出显示“慢”行,这表示可以在哪里进行改进
Annotate立即揭示了两件容易修复的事情:
将习语转化为cython理解的东西
首先,这些线路很慢:
for xDiff in [-1,0,1]:
for yDiff in [-1,0,1]:
for zDiff in [-1,0,1]:
原因是Cython不知道如何将列表迭代转换为干净的c代码。它需要转化为Cython可以优化的等效代码,即“范围内”形式:
for xDiff in range(-1, 2):
for yDiff in range(-1, 2):
for zDiff in range(-1, 2):
用于快速索引的类型数组
接下来的事情是这条线很慢:
if array[x1,y1,z1]:
这是因为数组
未指定类型。因此,它使用python级索引而不是c级索引。要解决此问题,您需要为数组指定一个类型,可以通过以下方式完成:
def sphericity(np.ndarray[np.uint8_t, ndim=3] array):
这是假设数组的类型为“uint8”,替换为适当的类型(注意:Cython不支持“np.bool”类型,因此我使用“uint8”)
也可以使用内存视图,不能在内存视图上使用numpy函数,但可以在阵列上创建视图,然后为视图而不是阵列编制索引:
cdef np.uint8_t array_view [:, :, :] = array
...
if array_view[x1,y1,z1]:
内存视图可能会稍微快一点,并在数组(python级调用)和视图(c级调用)之间进行了明确划分。如果不使用numpy函数,则可以使用内存视图,不会出现问题
重写代码以避免在数组上多次传递
剩下的是计算索引
和非零计数
都很慢,这是由于各种原因造成的,但主要与数据的绝对大小有关(本质上,迭代512*512*512个元素只需要时间!)
一般来说,Numpy可以做的任何事情,优化的Cython都可以做得更快(通常快2-10倍)——Numpy只需为您节省大量的重新发明和大量的打字,并让您在更高的层次上思考(如果您不是c程序员,您可能无法很好地优化Cython)。但在这种情况下,它很简单,您可以消除索引
和非零计数
以及所有相关代码,然后执行以下操作:
for x in range(0, xSize):
for y in range(0, ySize):
for z in range(0, zSize):
if array[x,y,z] == 0:
continue
...
这是非常快的,因为c(干净的Cython编译成完美的)每秒执行数十亿次操作没有问题。通过消除索引
和非零计数
步骤,您基本上在整个阵列上节省了两次完整的迭代,即使在最高速度下,每次迭代也至少需要0.1秒。更重要的是CPU缓存,整个阵列为128mb,比CPU缓存大得多,因此在一个过程中完成所有操作可以更好地利用CPU缓存(如果阵列完全适合CPU缓存,则多个过程不会有太大关系)
优化版本
以下是我的优化版本的完整代码:
#cython: boundscheck=False, nonecheck=False, wraparound=False
import numpy as np
cimport numpy as np
def sphericity2(np.uint8_t [:, :, :] array):
#Pass an mask array (1's are marked, 0's ignored)
cdef int xSize,ySize,zSize
xSize=array.shape[0]
ySize=array.shape[1]
zSize=array.shape[2]
cdef int sa,vol,x,y,z,neighbors,xDiff,yDiff,zDiff,x1,y1,z1
cdef float onethird,twothirds,sp
sa=vol=0 #keep running tally of volume and surface area
for x in range(0, xSize):
for y in range(0, ySize):
for z in range(0, zSize):
if array[x,y,z] == 0:
continue
neighbors=0
vol+=1
for xDiff in range(-1, 2):
for yDiff in range(-1, 2):
for zDiff in range(-1, 2):
if abs(xDiff)+abs(yDiff)+abs(zDiff)==1:
x1=x+xDiff
y1=y+yDiff
z1=z+zDiff
if x1>=0 and y1>=0 and z1>=0 and x1<xSize and y1<ySize and z1<zSize:
#print '-',x1,y1,z1,array[x1,y1,z1]
if array[x1,y1,z1]:
#print '-',x1,y1,z1,array[x1,y1,z1]
neighbors+=1
#print 'had this many neighbors',neighbors
sa+=(6-neighbors)
onethird=float(1)/float(3)
twothirds=float(2)/float(3)
sph = ((np.pi**onethird)*((6*vol)**twothirds)) / sa
#print 'sphericity',sphericity
return sph
#cython:boundscheck=False,nonecheck=False,wrapparound=False
将numpy作为np导入
cimport numpy作为np
def球体2(np.uint8_t[:,:,:]数组):
#传递掩码数组(标记1,忽略0)
cdef int xSize、ySize、zSize
xSize=array.shape[0]
ySize=array.shape[1]
zSize=array.shape[2]
cdef int sa,vol,x,y,z,邻居,xDiff,yDiff,zDiff,x1,y1,z1
cdef浮动三分之一,三分之二,标准普尔
sa=vol=0#记录体积和表面积
对于范围(0,xSize)内的x:
对于范围(0,y)中的y:
对于范围(0,zSize)内的z:
如果数组[x,y,z]==0:
持续
邻居=0
体积+=1
对于范围(-1,2)内的xDiff:
对于范围(-1,2)内的yDiff:
对于范围(-1,2)内的zDiff:
如果abs(xDiff)+abs(yDiff)+abs(zDiff)==1:
x1=x+xDiff
y1=y+yDiff
z1=z+zDiff
如果x1>=0和y1>=0和z1>=0和X10,请说明您的开发环境(OS、Python版本、Numpy版本、C
#cython: boundscheck=False, nonecheck=False, wraparound=False
import numpy as np
cimport numpy as np
def sphericity2(np.uint8_t [:, :, :] array):
#Pass an mask array (1's are marked, 0's ignored)
cdef int xSize,ySize,zSize
xSize=array.shape[0]
ySize=array.shape[1]
zSize=array.shape[2]
cdef int sa,vol,x,y,z,neighbors,xDiff,yDiff,zDiff,x1,y1,z1
cdef float onethird,twothirds,sp
sa=vol=0 #keep running tally of volume and surface area
for x in range(0, xSize):
for y in range(0, ySize):
for z in range(0, zSize):
if array[x,y,z] == 0:
continue
neighbors=0
vol+=1
for xDiff in range(-1, 2):
for yDiff in range(-1, 2):
for zDiff in range(-1, 2):
if abs(xDiff)+abs(yDiff)+abs(zDiff)==1:
x1=x+xDiff
y1=y+yDiff
z1=z+zDiff
if x1>=0 and y1>=0 and z1>=0 and x1<xSize and y1<ySize and z1<zSize:
#print '-',x1,y1,z1,array[x1,y1,z1]
if array[x1,y1,z1]:
#print '-',x1,y1,z1,array[x1,y1,z1]
neighbors+=1
#print 'had this many neighbors',neighbors
sa+=(6-neighbors)
onethird=float(1)/float(3)
twothirds=float(2)/float(3)
sph = ((np.pi**onethird)*((6*vol)**twothirds)) / sa
#print 'sphericity',sphericity
return sph
Original : 2.123s
Jaime's : 1.819s
Optimized Cython : 0.136s
@ moarningsun : 0.090s