Python pd.groupby()中介和
在Python 3.6、1.1.2上 我在努力做中间价。我当然可以使用总和(水平),但这不是优雅的,也不是最优的,我想知道是否有更好的方法。 例如:Python pd.groupby()中介和,python,pandas,pandas-groupby,python-3.6,Python,Pandas,Pandas Groupby,Python 3.6,在Python 3.6、1.1.2上 我在努力做中间价。我当然可以使用总和(水平),但这不是优雅的,也不是最优的,我想知道是否有更好的方法。 例如: df = pd.DataFrame.from_dict({'level_0': {0: 'a', 1: 'a', 2: 'a', 3: 'b', 4: 'b', 5: 'b', 6: 'b', 7: 'b', 8: 'c', 9: 'c', 10: 'c', 11: 'c', 12: 'c', 13: 'c', 14: 'c', 15: 'c'},
df = pd.DataFrame.from_dict({'level_0': {0: 'a', 1: 'a', 2: 'a', 3: 'b', 4: 'b', 5: 'b', 6: 'b', 7: 'b', 8: 'c', 9: 'c', 10: 'c', 11: 'c', 12: 'c', 13: 'c', 14: 'c', 15: 'c'}, 'level_1': {0: 'aa', 1: 'aa', 2: 'bb', 3: 'aa', 4: 'aa', 5: 'aa', 6: 'cc', 7: 'cc', 8: 'bb', 9: 'bb', 10: 'cc', 11: 'cc', 12: 'cc', 13: 'dd', 14: 'dd', 15: 'dd'}, 'level_2': {0: 'aaa', 1: 'aab', 2: 'bba', 3: 'aaa', 4: 'aab', 5: 'aac', 6: 'cca', 7: 'ccb', 8: 'bba', 9: 'bbb', 10: 'cca', 11: 'ccb', 12: 'ccc', 13: 'dda', 14: 'ddb', 15: 'ddc'}, 'value': {0: 5, 1: 2, 2: 3, 3: 5, 4: 9, 5: 2, 6: 2, 7: 9, 8: 1, 9: 9, 10: 9, 11: 5, 12: 5, 13: 5, 14: 5, 15: 3}}).groupby(by=['level_0', 'level_1', 'level_2']).sum()
给我:
value
level_0 level_1 level_2
a aa aaa 5
aab 2
bb bba 3
b aa aaa 5
aab 9
aac 2
cc cca 2
ccb 9
c bb bba 1
bbb 9
cc cca 9
ccb 5
ccc 5
dd dda 5
ddb 5
ddc 3
现在,我希望能够获得每个级别0和级别1的小计,如下所示:
你在这里:
import pandas as pd
df = pd.DataFrame.from_dict({'level_0': {0: 'a', 1: 'a', 2: 'a', 3: 'b', 4: 'b', 5: 'b', 6: 'b', 7: 'b', 8: 'c', 9: 'c', 10: 'c', 11: 'c', 12: 'c', 13: 'c', 14: 'c', 15: 'c'}, 'level_1': {0: 'aa', 1: 'aa', 2: 'bb', 3: 'aa', 4: 'aa', 5: 'aa', 6: 'cc', 7: 'cc', 8: 'bb', 9: 'bb', 10: 'cc', 11: 'cc', 12: 'cc', 13: 'dd', 14: 'dd', 15: 'dd'}, 'level_2': {0: 'aaa', 1: 'aab', 2: 'bba', 3: 'aaa', 4: 'aab', 5: 'aac', 6: 'cca', 7: 'ccb', 8: 'bba', 9: 'bbb', 10: 'cca', 11: 'ccb', 12: 'ccc', 13: 'dda', 14: 'ddb', 15: 'ddc'},
'value': {0: 5, 1: 2, 2: 3, 3: 5, 4: 9, 5: 2, 6: 2, 7: 9, 8: 1, 9: 9, 10: 9, 11: 5, 12: 5, 13: 5, 14: 5, 15: 3}})
gb1 = df.groupby(by=['level_0', 'level_1', 'level_2']).sum().reset_index()
gb2 = df.groupby(by=['level_0', 'level_1']).sum().reset_index()
gb3 = df.groupby(by=['level_0']).sum().reset_index()
gb2['level_2'] = ''
gb3['level_1'] = ''
gb3['level_2'] = ''
gb_all = pd.concat((gb1, gb2, gb3), axis=0)
gb_all.sort_values(['level_0', 'level_1', 'level_2'], inplace=True)
gb_all.reset_index(inplace=True, drop=True)
print(gb_all)
输出:
level_0 level_1 level_2 value
0 a 10
1 a aa 7
2 a aa aaa 5
3 a aa aab 2
4 a bb 3
5 a bb bba 3
6 b 27
7 b aa 16
8 b aa aaa 5
9 b aa aab 9
10 b aa aac 2
11 b cc 11
12 b cc cca 2
13 b cc ccb 9
14 c 42
15 c bb 10
16 c bb bba 1
17 c bb bbb 9
18 c cc 19
19 c cc cca 9
20 c cc ccb 5
21 c cc ccc 5
22 c dd 13
23 c dd dda 5
24 c dd ddb 5
25 c dd ddc 3
很好的解决方案。我在做类似的工作。您只需在末尾添加以下内容:
gb_all.groupby(['level_0'、'level_1'、'level_2']).agg('first')
,它将匹配精确的所需输出。