Python numpy中是否有与MATLAB Accumaray等效的软件？_Python_Numpy_Accumulator

Python numpy中是否有与MATLAB Accumaray等效的软件？

python numpy

Python numpy中是否有与MATLAB Accumaray等效的软件？,python,numpy,accumulator,Python,Numpy,Accumulator,我正在寻找一个快速的解决方案，以MATLAB的numpy。accumarray累加属于同一索引的数组元素。例如： a = np.arange(1,11) # array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) accmap = np.array([0,1,0,0,0,1,1,2,2,1]) 结果应该是 array([13, 25, 17]) 到目前为止我所做的：我在中尝试了acum功能，该功能工作正常，但速度较慢 accmap = np.repe

我正在寻找一个快速的解决方案，以MATLAB的numpy。

accumarray

累加属于同一索引的数组元素。例如：

a = np.arange(1,11)
# array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
accmap = np.array([0,1,0,0,0,1,1,2,2,1])

结果应该是

array([13, 25, 17])

到目前为止我所做的： 我在中尝试了

acum

功能，该功能工作正常，但速度较慢

accmap = np.repeat(np.arange(1000), 20)
a = np.random.randn(accmap.size)
%timeit accum(accmap, a, np.sum)
# 1 loops, best of 3: 293 ms per loop

然后我试着使用应该工作得更快但不能正常工作的：

accum_np(accmap, a)
# array([  1.,   2.,  12.,  13.,  17.,  10.])

有没有一个内置的numpy函数可以这样做？或任何其他建议？

与

权重

可选参数一起使用。在您的示例中，您将执行以下操作：

np.bincount(accmap, weights=a)

那么以下内容如何：

import numpy

def accumarray(a, accmap):

    ordered_indices = numpy.argsort(accmap)

    ordered_accmap = accmap[ordered_indices]

    _, sum_indices = numpy.unique(ordered_accmap, return_index=True)

    cumulative_sum = numpy.cumsum(a[ordered_indices])[sum_indices-1]

    result = numpy.empty(len(sum_indices), dtype=a.dtype)
    result[:-1] = cumulative_sum[1:]
    result[-1] = cumulative_sum[0]

    result[1:] = result[1:] - cumulative_sum[1:]

    return result

没有公认的答案那么好，但是：

[np.sum([a[x] for x in y]) for y in [list(np.where(accmap==z)) for z in np.unique(accmap).tolist()]]

这需要

108us/循环

（100000个循环，最好是3个）

公认的答案（

np.bincount（accmap，weights=a

）每个循环花费

2.05 US

（100000个循环，三个循环中最好的一个）

我已经用

scipy.weave

编写了一个accumarray实现，并上传到github:

迟到了，但是

正如@Jamie所说，对于求和的情况，

np.bincount

是快速而简单的。但是在更一般的情况下，对于其他

ufunc

，例如

最大值

，您可以使用该方法

我将[参见下面的链接]组合在一起，将其封装在一个类似Matlab的界面中。它还利用重复的索引规则来提供

'last'

和

'first'

函数，并且与Matlab不同，

'mean'

被合理地优化（在Matlab中使用

@mean

调用

accumarray

非常慢，因为它为每个组运行一个非内置函数，这很愚蠢）

请注意，我还没有特别测试要点，但希望将来能用额外的功能和错误修复对其进行更新

2015年5月/6月更新：我已经修改了我的实现-它现在作为PyPi的一部分提供并在PyPi上提供（

pip install numpy groupies

）。基准如下所示（有关最新值，请参阅github repo）

这里我们使用的是从

[0,1000）

中统一选取的

索引。具体来说，大约25%的值是

（用于布尔运算），其余的值均匀分布在

[-50,25）

上。计时显示为10次重复

purepy——只使用纯python，部分依赖于
```
itertools.groupby
```
np grouploop-使用
```
numpy
```
根据
```
idx
```
对值进行排序，然后使用
```
split
```
创建单独的数组，然后在这些数组上循环，为每个数组运行相关的
```
numpy
```
函数
np-ufuncat-使用
```
numpy
```
```
ufunc.at
```
方法，这比它应该使用的速度慢-正如我在numpy的github repo上创建的一样
np优化-使用自定义
```
numpy
```
索引/其他技巧击败上述两种实现（除了依赖
```
ufunc.at
```
的
```
min-max-prod
```
）

pandas-

pd.DataFrame（{'idx'：idx，'vals'：vals}）.groupby（'idx'）.sum（）

etc

请注意，一些

no impl

s可能是没有根据的，但我还没有费心让它们工作

正如github上所解释的，

accumarray

现在支持

nan

前缀函数（例如

nansum

）以及

sort

、

rsort

和

array

。它还支持多维索引。

您可以在一行中使用pandas DataFrame实现这一点

In [159]: df = pd.DataFrame({"y":np.arange(1,11),"x":[0,1,0,0,0,1,1,2,2,1]})

In [160]: df
Out[160]: 
   x   y
0  0   1
1  1   2
2  0   3
3  0   4
4  0   5
5  1   6
6  1   7
7  2   8
8  2   9
9  1  10

In [161]: pd.pivot_table(df,values='y',index='x',aggfunc=sum)
Out[161]: 
    y
x    
0  13
1  25
2  17

您可以告诉

pivot\u表

使用特定列作为索引和值，并获取新的DataFrame对象。当您将聚合函数指定为总和时，结果将与Matlab的Accumaray相同。

这取决于您正试图做什么，但numpy unique有一系列可选输出，您可以使用它们se累加。如果您的数组有多个相同的值，则unique将通过将return_counts选项设置为true来计算有多少相同的值。在一些简单的应用程序中，这就是您需要做的全部工作

numpy.unique(ar, return_index=False, return_inverse=False, return_counts=True, axis=None)

您还可以将索引设置为true，并使用它来累加不同的数组。

我的博客文章过时了。请尝试github版本。它有一个覆盖良好的测试套件。@Michael和我创建了一个名为的包，其中包含一个类似accumarray的函数，名为

aggregate

。有关详细信息，请参阅下面的答案。干得好，伙计们。我正在尝试使用你的例程。遗憾的是，我无法再现与matlab相同的结果，而且对于多维数组，理解它的工作原理很复杂。你能帮我一点忙吗？最好在github repo上发布一个bug报告（提供一个最小的代码示例会有所帮助）谢谢你的回答。我会问一个问题，叫numpy groupies aggregate

numpy.unique(ar, return_index=False, return_inverse=False, return_counts=True, axis=None)