Python 如何聚合NumPy记录数组(sum、min、max等)?

Python 如何聚合NumPy记录数组(sum、min、max等)?,python,numpy,aggregate-functions,aggregate,recarray,Python,Numpy,Aggregate Functions,Aggregate,Recarray,考虑一个简单的记录数组结构: import numpy as np ijv_dtype = [ ('I', 'i'), ('J', 'i'), ('v', 'd'), ] ijv = np.array([ (0, 0, 3.3), (0, 1, 1.1), (0, 1, 4.4), (1, 1, 2.2), ], ijv_dtype) print(ijv) # [(0, 0, 3.3) (0, 1, 1.1) (0, 1, 4.4

考虑一个简单的记录数组结构:

import numpy as np
ijv_dtype = [
    ('I', 'i'),
    ('J', 'i'),
    ('v', 'd'),
]
ijv = np.array([
    (0, 0, 3.3),
    (0, 1, 1.1),
    (0, 1, 4.4),
    (1, 1, 2.2),
    ], ijv_dtype)
print(ijv)  # [(0, 0, 3.3) (0, 1, 1.1) (0, 1, 4.4) (1, 1, 2.2)]
我想通过对
I
J
的独特组合进行分组,从
v
中获得某些统计信息(总和、最小值、最大值等)。从SQL来看,预期结果是:

select i, j, sum(v) as v from ijv group by i, j;
 i | j |  v
---+---+-----
 0 | 0 | 3.3
 0 | 1 | 5.5
 1 | 1 | 2.2
(顺序不重要)

我能想到的最好的NumPy是丑陋的,我不相信我已经正确地排列了结果(尽管它似乎在这里起作用):


我想有更好的方法来做到这一点!我正在使用NumPy 1.4.1。

NumPy
对于这样的任务来说有点太低级了。如果您必须使用纯
numpy
,我认为您的解决方案很好,但是如果您不介意使用更高抽象级别的东西,请尝试:

输出:

   I  J    v
0  0  0  3.3
1  0  1  1.1
2  0  1  4.4
3  1  1  2.2
       v
I J     
0 0  3.3
  1  5.5
1 1  2.2

这与您已经拥有的相比并不是一个很大的进步,但它至少摆脱了for循环

# Starting with your original setup

# Get the unique ij values and the mapping from ungrouped to grouped.
u_ij, inv_ij = np.unique(ijv[['I', 'J']], return_inverse=True)

# Create a totals array. You could do the fancy ijv_dtype thing if you wanted.
totals = np.zeros_like(u_ij.shape)

# Here's the magic bit. You can think of it as 
# totals[inv_ij] += ijv["v"] 
# except the above doesn't behave as expected sadly.
np.add.at(totals, inv_ij, ijv["v"])

print(totals)

事实上,您使用的是numpy的多数据类型功能,这表明您应该使用pandas。当试图将
i
s、
j
s和
v
s放在一起时,它通常会减少代码的麻烦。

对于早期的
numpy
版本,
pandas
可能不是一个选项。我的第一次尝试是在
集合中收集数据。默认使用
(i,j)命令(list)
元组作为键。然后,我可以在每个列表上预生成所需的统计数据。
   I  J    v
0  0  0  3.3
1  0  1  1.1
2  0  1  4.4
3  1  1  2.2
       v
I J     
0 0  3.3
  1  5.5
1 1  2.2
# Starting with your original setup

# Get the unique ij values and the mapping from ungrouped to grouped.
u_ij, inv_ij = np.unique(ijv[['I', 'J']], return_inverse=True)

# Create a totals array. You could do the fancy ijv_dtype thing if you wanted.
totals = np.zeros_like(u_ij.shape)

# Here's the magic bit. You can think of it as 
# totals[inv_ij] += ijv["v"] 
# except the above doesn't behave as expected sadly.
np.add.at(totals, inv_ij, ijv["v"])

print(totals)