Python 如何聚合NumPy记录数组(sum、min、max等)?
考虑一个简单的记录数组结构:Python 如何聚合NumPy记录数组(sum、min、max等)?,python,numpy,aggregate-functions,aggregate,recarray,Python,Numpy,Aggregate Functions,Aggregate,Recarray,考虑一个简单的记录数组结构: import numpy as np ijv_dtype = [ ('I', 'i'), ('J', 'i'), ('v', 'd'), ] ijv = np.array([ (0, 0, 3.3), (0, 1, 1.1), (0, 1, 4.4), (1, 1, 2.2), ], ijv_dtype) print(ijv) # [(0, 0, 3.3) (0, 1, 1.1) (0, 1, 4.4
import numpy as np
ijv_dtype = [
('I', 'i'),
('J', 'i'),
('v', 'd'),
]
ijv = np.array([
(0, 0, 3.3),
(0, 1, 1.1),
(0, 1, 4.4),
(1, 1, 2.2),
], ijv_dtype)
print(ijv) # [(0, 0, 3.3) (0, 1, 1.1) (0, 1, 4.4) (1, 1, 2.2)]
我想通过对I
和J
的独特组合进行分组,从v
中获得某些统计信息(总和、最小值、最大值等)。从SQL来看,预期结果是:
select i, j, sum(v) as v from ijv group by i, j;
i | j | v
---+---+-----
0 | 0 | 3.3
0 | 1 | 5.5
1 | 1 | 2.2
(顺序不重要)
我能想到的最好的NumPy是丑陋的,我不相信我已经正确地排列了结果(尽管它似乎在这里起作用):
我想有更好的方法来做到这一点!我正在使用NumPy 1.4.1。
NumPy
对于这样的任务来说有点太低级了。如果您必须使用纯numpy
,我认为您的解决方案很好,但是如果您不介意使用更高抽象级别的东西,请尝试:
输出:
I J v
0 0 0 3.3
1 0 1 1.1
2 0 1 4.4
3 1 1 2.2
v
I J
0 0 3.3
1 5.5
1 1 2.2
这与您已经拥有的相比并不是一个很大的进步,但它至少摆脱了for循环
# Starting with your original setup
# Get the unique ij values and the mapping from ungrouped to grouped.
u_ij, inv_ij = np.unique(ijv[['I', 'J']], return_inverse=True)
# Create a totals array. You could do the fancy ijv_dtype thing if you wanted.
totals = np.zeros_like(u_ij.shape)
# Here's the magic bit. You can think of it as
# totals[inv_ij] += ijv["v"]
# except the above doesn't behave as expected sadly.
np.add.at(totals, inv_ij, ijv["v"])
print(totals)
事实上,您使用的是numpy的多数据类型功能,这表明您应该使用pandas。当试图将
i
s、j
s和v
s放在一起时,它通常会减少代码的麻烦。对于早期的numpy
版本,pandas
可能不是一个选项。我的第一次尝试是在集合中收集数据。默认使用(i,j)命令(list)
元组作为键。然后,我可以在每个列表上预生成所需的统计数据。
I J v
0 0 0 3.3
1 0 1 1.1
2 0 1 4.4
3 1 1 2.2
v
I J
0 0 3.3
1 5.5
1 1 2.2
# Starting with your original setup
# Get the unique ij values and the mapping from ungrouped to grouped.
u_ij, inv_ij = np.unique(ijv[['I', 'J']], return_inverse=True)
# Create a totals array. You could do the fancy ijv_dtype thing if you wanted.
totals = np.zeros_like(u_ij.shape)
# Here's the magic bit. You can think of it as
# totals[inv_ij] += ijv["v"]
# except the above doesn't behave as expected sadly.
np.add.at(totals, inv_ij, ijv["v"])
print(totals)