Python Cython numpy阵列索引器速度改进
我用纯python编写了以下代码,其功能的描述在docstrings中:Python Cython numpy阵列索引器速度改进,python,performance,numpy,cython,Python,Performance,Numpy,Cython,我用纯python编写了以下代码,其功能的描述在docstrings中: import numpy as np from scipy.ndimage.measurements import find_objects import itertools def alt_indexer(arr): """ Returns a dictionary with the elements of arr as key and the corresponding slice as v
import numpy as np
from scipy.ndimage.measurements import find_objects
import itertools
def alt_indexer(arr):
"""
Returns a dictionary with the elements of arr as key
and the corresponding slice as value.
Note:
This function assumes arr is sorted.
Example:
>>> arr = [0,0,3,2,1,2,3]
>>> loc = _indexer(arr)
>>> loc
{0: (slice(0L, 2L, None),),
1: (slice(2L, 3L, None),),
2: (slice(3L, 5L, None),),
3: (slice(5L, 7L, None),)}
>>> arr = sorted(arr)
>>> arr[loc[3][0]]
[3, 3]
>>> arr[loc[2][0]]
[2, 2]
"""
unique, counts = np.unique(arr, return_counts=True)
labels = np.arange(1,len(unique)+1)
labels = np.repeat(labels,counts)
slicearr = find_objects(labels)
index_dict = dict(itertools.izip(unique,slicearr))
return index_dict
由于我将为非常大的数组编制索引,因此我希望通过使用cython来加速操作,下面是等效的实现:
import numpy as np
cimport numpy as np
def _indexer(arr):
cdef tuple unique_counts = np.unique(arr, return_counts=True)
cdef np.ndarray[np.int32_t,ndim=1] unique = unique_counts[0]
cdef np.ndarray[np.int32_t,ndim=1] counts = unique_counts[1].astype(int)
cdef int start=0
cdef int end
cdef int i
cdef dict d ={}
for i in xrange(len(counts)):
if i>0:
start = counts[i-1]+start
end=counts[i]+start
d[unique[i]]=slice(start,end)
return d
基准
我比较了完成这两项操作所需的时间:
In [26]: import numpy as np
In [27]: rr=np.random.randint(0,1000,1000000)
In [28]: %timeit _indexer(rr)
10 loops, best of 3: 40.5 ms per loop
In [29]: %timeit alt_indexer(rr) #pure python
10 loops, best of 3: 51.4 ms per loop
正如你所看到的,速度的提高是最小的。我确实意识到,自从我使用numpy以来,我的代码已经进行了部分优化
有没有我不知道的瓶颈?
我是否应该使用np.unique
并编写自己的实现
谢谢。由于
arr
具有非负、不太大和许多重复的int
数字,这里有一种替代方法,用于模拟与np.unique相同的行为(arr,return\u counts=True)
-
运行时测试
案例1:
案例2:
案例#3:让我们运行一个案例,其中unique
在从最小到最大的范围内有一些缺失的数字,并根据np.unique
版本验证结果,作为健全性检查。在这种情况下,我们不会有太多重复的数字,因此,我们也不希望性能更好
In [98]: arr = np.random.randint(0,10000,(1000)) # Input array
In [99]: unique, counts = np.unique(arr, return_counts=True)
...: unique1, counts1 = unique_counts(arr)
...:
In [100]: np.allclose(unique,unique1)
Out[100]: True
In [101]: np.allclose(counts,counts1)
Out[101]: True
In [102]: %timeit np.unique(arr, return_counts=True)
10000 loops, best of 3: 61.9 µs per loop
In [103]: %timeit unique_counts(arr)
10000 loops, best of 3: 71.8 µs per loop
cython
如果可以将循环转换为纯C
,则循环速度更快。在您的例子中,循环仍然使用numpy.unique
和Python字典和切片对象。很好的解决方案!不幸的是,整数从1000000开始。在某些情况下,数组将包含浮点数。@snowleopard是的,那么我们应该寻找其他方法进行优化。但你是对的,似乎np.unique需要被某些东西取代。我真正需要的是一个切片的dict,似乎有一些冗余的操作。
In [83]: arr = np.random.randint(0,100,(1000)) # Input array
In [84]: unique, counts = np.unique(arr, return_counts=True)
...: unique1, counts1 = unique_counts(arr)
...:
In [85]: np.allclose(unique,unique1)
Out[85]: True
In [86]: np.allclose(counts,counts1)
Out[86]: True
In [87]: %timeit np.unique(arr, return_counts=True)
10000 loops, best of 3: 53.2 µs per loop
In [88]: %timeit unique_counts(arr)
100000 loops, best of 3: 10.2 µs per loop
In [89]: arr = np.random.randint(0,1000,(10000)) # Input array
In [90]: %timeit np.unique(arr, return_counts=True)
1000 loops, best of 3: 713 µs per loop
In [91]: %timeit unique_counts(arr)
10000 loops, best of 3: 39.1 µs per loop
In [98]: arr = np.random.randint(0,10000,(1000)) # Input array
In [99]: unique, counts = np.unique(arr, return_counts=True)
...: unique1, counts1 = unique_counts(arr)
...:
In [100]: np.allclose(unique,unique1)
Out[100]: True
In [101]: np.allclose(counts,counts1)
Out[101]: True
In [102]: %timeit np.unique(arr, return_counts=True)
10000 loops, best of 3: 61.9 µs per loop
In [103]: %timeit unique_counts(arr)
10000 loops, best of 3: 71.8 µs per loop