Python numpy数组中的多个累积和
我是numpy的新手,如果这个问题已经被问到了,我很抱歉。我正在寻找一种矢量化解决方案,它能够在一维numpy数组中运行多个不同大小的cumsumPython numpy数组中的多个累积和,python,arrays,numpy,vectorization,cumsum,Python,Arrays,Numpy,Vectorization,Cumsum,我是numpy的新手,如果这个问题已经被问到了,我很抱歉。我正在寻找一种矢量化解决方案,它能够在一维numpy数组中运行多个不同大小的cumsum my_vector=np.array([1,2,3,4,5]) size_of_groups=np.array([3,2]) 我想要类似的东西 np.cumsum.group(my_vector,size_of_groups) [1,3,6,4,9] 我不想要带循环的解决方案。numpy函数或numpy操作。不确定numpy,但熊猫可以通过gro
my_vector=np.array([1,2,3,4,5])
size_of_groups=np.array([3,2])
我想要类似的东西
np.cumsum.group(my_vector,size_of_groups)
[1,3,6,4,9]
我不想要带循环的解决方案。numpy函数或numpy操作。不确定numpy,但熊猫可以通过groupby+cumsum轻松实现这一点:
这是一个矢量化的解决方案-
def intervaled_cumsum(ar, sizes):
# Make a copy to be used as output array
out = ar.copy()
# Get cumumlative values of array
arc = ar.cumsum()
# Get cumsumed indices to be used to place differentiated values into
# input array's copy
idx = sizes.cumsum()
# Place differentiated values that when cumumlatively summed later on would
# give us the desired intervaled cumsum
out[idx[0]] = ar[idx[0]] - arc[idx[0]-1]
out[idx[1:-1]] = ar[idx[1:-1]] - np.diff(arc[idx[:-1]-1])
return out.cumsum()
样本运行-
In [114]: ar = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
...: sizes = np.array([3,2,2,3,2])
In [115]: intervaled_cumsum(ar, sizes)
Out[115]: array([ 1, 3, 6, 4, 9, 6, 13, 8, 17, 27, 11, 23])
标杆管理
其他办法-
# @cᴏʟᴅsᴘᴇᴇᴅ's solution
import pandas as pd
def pandas_soln(my_vector, sizes):
s = pd.Series(my_vector)
return s.groupby(s.index.isin(sizes.cumsum()).cumsum()).cumsum().values
给定的样本使用了两个长度为2和3的间隔,保持该间隔,并简单地为计时目的提供更多的组数
时间安排-
In [146]: N = 10000 # number of groups
...: np.random.seed(0)
...: sizes = np.random.randint(2,4,(N))
...: ar = np.random.randint(0,N,sizes.sum())
In [147]: %timeit intervaled_cumsum(ar, sizes)
...: %timeit pandas_soln(ar, sizes)
10000 loops, best of 3: 178 µs per loop
1000 loops, best of 3: 1.82 ms per loop
In [148]: N = 100000 # number of groups
...: np.random.seed(0)
...: sizes = np.random.randint(2,4,(N))
...: ar = np.random.randint(0,N,sizes.sum())
In [149]: %timeit intervaled_cumsum(ar, sizes)
...: %timeit pandas_soln(ar, sizes)
100 loops, best of 3: 3.91 ms per loop
100 loops, best of 3: 17.3 ms per loop
In [150]: N = 1000000 # number of groups
...: np.random.seed(0)
...: sizes = np.random.randint(2,4,(N))
...: ar = np.random.randint(0,N,sizes.sum())
In [151]: %timeit intervaled_cumsum(ar, sizes)
...: %timeit pandas_soln(ar, sizes)
10 loops, best of 3: 31.6 ms per loop
1 loop, best of 3: 357 ms per loop
这是一个非常规的解决方案。不过不是很快。甚至比熊猫还要慢一点
>>> from scipy import linalg
>>>
>>> N = len(my_vector)
>>> D = np.repeat((*zip((1,-1)),), N, axis=1)
>>> D[1, np.cumsum(size_of_groups) - 1] = 0
>>>
>>> linalg.solve_banded((1, 0), D, my_vector)
array([1., 3., 6., 4., 9.])
>>> from scipy import linalg
>>>
>>> N = len(my_vector)
>>> D = np.repeat((*zip((1,-1)),), N, axis=1)
>>> D[1, np.cumsum(size_of_groups) - 1] = 0
>>>
>>> linalg.solve_banded((1, 0), D, my_vector)
array([1., 3., 6., 4., 9.])