为子集的连接组件标记区域优化python_Python_Cython_Numba

为子集的连接组件标记区域优化python

python

为子集的连接组件标记区域优化python,python,cython,numba,Python,Cython,Numba,我有一个二元映射，我在上面做连接元件标记，得到类似于64x64网格的东西- 现在我想按标签对它们进行分组，这样我就可以找到它们的面积和重心。我就是这么做的： #ccl_np is the computed array from the previous step (see pastebin) #I discard the label '1' as its the background unique, count = np.unique(ccl_np, return_counts = True)

我有一个二元映射，我在上面做连接元件标记，得到类似于64x64网格的东西-

现在我想按标签对它们进行分组，这样我就可以找到它们的面积和重心。我就是这么做的：

#ccl_np is the computed array from the previous step (see pastebin)
#I discard the label '1' as its the background
unique, count = np.unique(ccl_np, return_counts = True)
xcm_array = []
ycm_array = []

for i in range(1,len(unique)):
    subarray = np.where(ccl_np == unique[i])
    xcm_array.append("{0:.5f}".format((sum(subarray[0]))/(count[i]*1.)))
    ycm_array.append("{0:.5f}".format((sum(subarray[1]))/(count[i]*1.)))

final_array = zip(xcm_array,ycm_array,count[1:])

我想要一个快速代码（因为我将对4096x4096大小的网格执行此操作），并被告知检查numba。以下是我天真的尝试：

unique, inverse, count = np.unique(ccl_np, return_counts = True, return_inverse = True)
xcm_array = np.zeros(len(count),dtype=np.float32)
ycm_array = np.zeros(len(count),dtype=np.float32)
inverse = inverse.reshape(64,64)

@numba.autojit
def mysolver(xcm_array, ycm_array, inverse, count):
    for i in range(64):
        for j in range(64):
            pos = inverse[i][j]
            local_count = count[pos]
            xcm_array[pos] += i/(local_count*1.)
            ycm_array[pos] += j/(local_count*1.)


mysolver(xcm_array, ycm_array, inverse, count)

final_array = zip(xcm_array,ycm_array,count)

令我惊讶的是，使用numba的速度要慢一些，或者充其量等于前一种方法的速度。我做错了什么？还有，这可以在Cython完成吗？会更快吗

我正在使用最新的Anaconda python 2.7发行版中包含的包。

我认为问题可能是您对jit代码的计时不正确。第一次运行代码时，计时包括numba编译代码所需的时间。这被称为预热jit。如果你再打一次电话，费用就没有了

import numpy as np
import numba as nb

unique, inverse, count = np.unique(ccl_np, return_counts = True, return_inverse = True)
xcm_array = np.zeros(len(count),dtype=np.float32)
ycm_array = np.zeros(len(count),dtype=np.float32)
inverse = inverse.reshape(64,64)

def mysolver(xcm_array, ycm_array, inverse, count):
    for i in range(64):
        for j in range(64):
            pos = inverse[i][j]
            local_count = count[pos]
            xcm_array[pos] += i/(local_count*1.)
            ycm_array[pos] += j/(local_count*1.)

@nb.jit(nopython=True)
def mysolver_nb(xcm_array, ycm_array, inverse, count):
    for i in range(64):
        for j in range(64):
            pos = inverse[i,j]
            local_count = count[pos]
            xcm_array[pos] += i/(local_count*1.)
            ycm_array[pos] += j/(local_count*1.)

然后使用

timeit

计时，它会多次运行代码。首先是纯python版本：

In [4]:%timeit mysolver(xcm_array, ycm_array, inverse, count)
10 loops, best of 3: 25.8 ms per loop

然后是numba：

In [5]: %timeit mysolver_nb(xcm_array, ycm_array, inverse, count)
The slowest run took 3630.44 times longer than the fastest. This could mean         that an intermediate result is being cached 
10000 loops, best of 3: 33.1 µs per loop

numba代码快约1000倍

import numpy as np
import numba as nb

unique, inverse, count = np.unique(ccl_np, return_counts = True, return_inverse = True)
xcm_array = np.zeros(len(count),dtype=np.float32)
ycm_array = np.zeros(len(count),dtype=np.float32)
inverse = inverse.reshape(64,64)

def mysolver(xcm_array, ycm_array, inverse, count):
    for i in range(64):
        for j in range(64):
            pos = inverse[i][j]
            local_count = count[pos]
            xcm_array[pos] += i/(local_count*1.)
            ycm_array[pos] += j/(local_count*1.)

@nb.jit(nopython=True)
def mysolver_nb(xcm_array, ycm_array, inverse, count):
    for i in range(64):
        for j in range(64):
            pos = inverse[i,j]
            local_count = count[pos]
            xcm_array[pos] += i/(local_count*1.)
            ycm_array[pos] += j/(local_count*1.)

然后使用

timeit

计时，它会多次运行代码。首先是纯python版本：

In [4]:%timeit mysolver(xcm_array, ycm_array, inverse, count)
10 loops, best of 3: 25.8 ms per loop

然后是numba：

In [5]: %timeit mysolver_nb(xcm_array, ycm_array, inverse, count)
The slowest run took 3630.44 times longer than the fastest. This could mean         that an intermediate result is being cached 
10000 loops, best of 3: 33.1 µs per loop

numba代码快约1000倍

尝试将

逆[i][j]

更改为

逆[i，j]

。使用或不使用numba时，访问numpy数组元素的效率应该更高。@JoshAdel我将其更改为

[I，j]

，速度大约快2%。尝试将

逆[I][j]

更改为

逆[I，j]

。在使用或不使用numba时，访问numpy数组元素的效率应该会更高。@JoshAdel我将其更改为

[I，j]

，大约快了2%。谢谢！对于256x256网格，我的速度提高了390x:）不知道如何“预热”JIT编译器。深入研究文档，我将签名更改为：

@JIT（[（float32[：]，float32[：]，int64[：，：]，int64[：]，nopython=True，cache=True）

，现在快了710x！：）对于早期的数组，jit默认为float64。谢谢！对于256x256网格，我的速度提高了390x:）不知道如何“预热”JIT编译器。深入研究文档，我将签名更改为：

@JIT（[（float32[：]，float32[：]，int64[：，：]，int64[：]，nopython=True，cache=True）

，现在快了710x！：）对于前面的数组，jit默认为float64