Python 如何聚合NumPy记录数组（sum、min、max等）？_Python_Numpy_Aggregate Functions_Aggregate_Recarray

Python 如何聚合NumPy记录数组（sum、min、max等）？

python numpy

Python 如何聚合NumPy记录数组（sum、min、max等）？,python,numpy,aggregate-functions,aggregate,recarray,Python,Numpy,Aggregate Functions,Aggregate,Recarray,考虑一个简单的记录数组结构： import numpy as np ijv_dtype = [ ('I', 'i'), ('J', 'i'), ('v', 'd'), ] ijv = np.array([ (0, 0, 3.3), (0, 1, 1.1), (0, 1, 4.4), (1, 1, 2.2), ], ijv_dtype) print(ijv) # [(0, 0, 3.3) (0, 1, 1.1) (0, 1, 4.4

考虑一个简单的记录数组结构：

import numpy as np
ijv_dtype = [
    ('I', 'i'),
    ('J', 'i'),
    ('v', 'd'),
]
ijv = np.array([
    (0, 0, 3.3),
    (0, 1, 1.1),
    (0, 1, 4.4),
    (1, 1, 2.2),
    ], ijv_dtype)
print(ijv)  # [(0, 0, 3.3) (0, 1, 1.1) (0, 1, 4.4) (1, 1, 2.2)]

我想通过对

和

的独特组合进行分组，从

中获得某些统计信息（总和、最小值、最大值等）。从SQL来看，预期结果是：

select i, j, sum(v) as v from ijv group by i, j;
 i | j |  v
---+---+-----
 0 | 0 | 3.3
 0 | 1 | 5.5
 1 | 1 | 2.2

（顺序不重要）

我能想到的最好的NumPy是丑陋的，我不相信我已经正确地排列了结果（尽管它似乎在这里起作用）：

我想有更好的方法来做到这一点！我正在使用NumPy 1.4.1。

NumPy

对于这样的任务来说有点太低级了。如果您必须使用纯

numpy

，我认为您的解决方案很好，但是如果您不介意使用更高抽象级别的东西，请尝试：

输出：

   I  J    v
0  0  0  3.3
1  0  1  1.1
2  0  1  4.4
3  1  1  2.2
       v
I J     
0 0  3.3
  1  5.5
1 1  2.2

这与您已经拥有的相比并不是一个很大的进步，但它至少摆脱了for循环

# Starting with your original setup

# Get the unique ij values and the mapping from ungrouped to grouped.
u_ij, inv_ij = np.unique(ijv[['I', 'J']], return_inverse=True)

# Create a totals array. You could do the fancy ijv_dtype thing if you wanted.
totals = np.zeros_like(u_ij.shape)

# Here's the magic bit. You can think of it as 
# totals[inv_ij] += ijv["v"] 
# except the above doesn't behave as expected sadly.
np.add.at(totals, inv_ij, ijv["v"])

print(totals)

事实上，您使用的是numpy的多数据类型功能，这表明您应该使用pandas。当试图将

s、

s和

s放在一起时，它通常会减少代码的麻烦。

对于早期的

numpy

版本，

pandas

可能不是一个选项。我的第一次尝试是在

集合中收集数据。默认使用（i，j）命令（list）

元组作为键。然后，我可以在每个列表上预生成所需的统计数据。

   I  J    v
0  0  0  3.3
1  0  1  1.1
2  0  1  4.4
3  1  1  2.2
       v
I J     
0 0  3.3
  1  5.5
1 1  2.2

# Starting with your original setup

# Get the unique ij values and the mapping from ungrouped to grouped.
u_ij, inv_ij = np.unique(ijv[['I', 'J']], return_inverse=True)

# Create a totals array. You could do the fancy ijv_dtype thing if you wanted.
totals = np.zeros_like(u_ij.shape)

# Here's the magic bit. You can think of it as 
# totals[inv_ij] += ijv["v"] 
# except the above doesn't behave as expected sadly.
np.add.at(totals, inv_ij, ijv["v"])

print(totals)