Python 从groupby的结果添加新列
具有下面的数据帧,并希望添加一个新列,指示groupby操作中与max关联的值。因此,我的DF将有许多这样的记录,我只想通过类似于Python 从groupby的结果添加新列,python,pandas,Python,Pandas,具有下面的数据帧,并希望添加一个新列,指示groupby操作中与max关联的值。因此,我的DF将有许多这样的记录,我只想通过类似于DF.groupby('B')['C'].mean().idxmax()的方法循环遍历每个组,但遍历整个DF,然后创建一个新的col,结果如下: A B C D 0 05641 KMPV 45 056 1 05649 KMPV 45 056 2 05650 KMPV 45 0
DF.groupby('B')['C'].mean().idxmax()
的方法循环遍历每个组,但遍历整个DF,然后创建一个新的col,结果如下:
A B C D
0 05641 KMPV 45 056
1 05649 KMPV 45 056
2 05650 KMPV 45 056
3 05651 KMPV 45 056
4 05653 KMPV 45 056
5 05678 KMPV 45 056
6 05654 KMPV 45 056
7 05609 KMPV 45 056
8 05620 KMPV 45 056
9 05633 KMPV 45 056
10 05601 KMPV 45 056
11 05602 KMPV 45 056
12 05603 KMPV 45 056
13 05662 KMVL 23 056
14 05665 KMVL 23 056
15 05672 KMVL 23 056
16 05677 KMVL 23 056
17 05648 KMVL 23 056
18 05652 KMVL 23 056
19 05680 KMVL 23 056
20 05655 KMVL 23 056
21 05656 KMVL 23 056
22 05657 KMVL 23 056
23 05661 KMVL 23 056
24 05681 KMVL 23 056
25 05682 KMVL 23 056
26 05647 K1V4 30 056
27 05658 K1V4 30 056
新数据框,列指示KMPV具有最高值
A B C D newval
0 05641 KMPV 45 056 KMPV
1 05649 KMPV 45 056 KMPV
2 05650 KMPV 45 056 KMPV
3 05651 KMPV 45 056 KMPV
4 05653 KMPV 45 056 KMPV
5 05678 KMPV 45 056 KMPV
6 05654 KMPV 45 056 KMPV
7 05609 KMPV 45 056 KMPV
8 05620 KMPV 45 056 KMPV
9 05633 KMPV 45 056 KMPV
10 05601 KMPV 45 056 KMPV
11 05602 KMPV 45 056 KMPV
12 05603 KMPV 45 056 KMPV
13 05662 KMVL 23 056 KMPV
14 05665 KMVL 23 056 KMPV
15 05672 KMVL 23 056 KMPV
16 05677 KMVL 23 056 KMPV
17 05648 KMVL 23 056 KMPV
18 05652 KMVL 23 056 KMPV
19 05680 KMVL 23 056 KMPV
20 05655 KMVL 23 056 KMPV
21 05656 KMVL 23 056 KMPV
22 05657 KMVL 23 056 KMPV
23 05661 KMVL 23 056 KMPV
24 05681 KMVL 23 056 KMPV
25 05682 KMVL 23 056 KMPV
26 05647 K1V4 30 056 KMPV
27 05658 K1V4 30 056 KMPV
简单地说。如何从下面的groupby中选择最大值?换句话说,我想过滤到056->KMPV和100->KJRB,因为它们的数字最高,以此类推
A B
056 K1V4 30
KMPV 45
KMVL 23
100 KJRB 130
KNYC 84
KTEB 80
又快又脏
数据:
{'A': {0: 5641, 1: 5649, 2: 5650, 3: 5651, 4: 5653, 5: 5678, 6: 5654, 7: 5609, 8: 5620, 9: 5633, 10: 5601, 11: 5602, 12: 5603, 13: 5662, 14: 5665, 15: 5672, 16: 5677, 17: 5648, 18: 5652, 19: 5680, 20: 10155, 21: 10156, 22: 10157, 23: 10161, 24: 10181, 25: 10182, 26: 10147, 27: 10158}, 'C': {0: 45, 1: 45, 2: 45, 3: 45, 4: 45, 5: 45, 6: 45, 7: 45, 8: 45, 9: 45, 10: 45, 11: 45, 12: 45, 13: 23, 14: 23, 15: 23, 16: 23, 17: 23, 18: 23, 19: 23, 20: 23, 21: 23, 22: 23, 23: 23, 24: 23, 25: 23, 26: 30, 27: 30}, 'B': {0: 'KMPV', 1: 'KMPV', 2: 'KMPV', 3: 'KMPV', 4: 'KMPV', 5: 'KMPV', 6: 'KMPV', 7: 'KMPV', 8: 'KMPV', 9: 'KMPV', 10: 'KMPV', 11: 'KMPV', 12: 'KMPV', 13: 'KMVL', 14: 'KMVL', 15: 'KMVL', 16: 'KMVL', 17: 'KMVL', 18: 'KMVL', 19: 'KMVL', 20: 'KMVL', 21: 'KMVL', 22: 'KMVL', 23: 'KMVL', 24: 'KMVL', 25: 'KMVL', 26: 'K1V4', 27: 'K1V4'}, 'D': {0: 56, 1: 56, 2: 56, 3: 56, 4: 56, 5: 56, 6: 56, 7: 56, 8: 56, 9: 56, 10: 56, 11: 56, 12: 56, 13: 56, 14: 56, 15: 56, 16: 56, 17: 56, 18: 56, 19: 56, 20: 101, 21: 101, 22: 101, 23: 101, 24: 101, 25: 101, 26: 101, 27: 101}}
import pandas as pd
df = pd.read_csv('so.csv')
df_ = df.groupby(['D'])
# Result of above line is a tuple.
for g, v in df_:
# Get max C of each group. Then get column B.
x = v[v['C'] == v['C'].max()]['B'].iat[0]
df.loc[df['D'] == g,'newval'] = x
print df
A B C D newval
0 5641 KMPV 45 56 KMPV
...
13 5662 KMVL 23 56 KMPV
...
20 10155 KMVL 23 101 K1V4
...
26 10147 K1V4 30 101 K1V4
27 10158 K1V4 30 101 K1V4
代码:
{'A': {0: 5641, 1: 5649, 2: 5650, 3: 5651, 4: 5653, 5: 5678, 6: 5654, 7: 5609, 8: 5620, 9: 5633, 10: 5601, 11: 5602, 12: 5603, 13: 5662, 14: 5665, 15: 5672, 16: 5677, 17: 5648, 18: 5652, 19: 5680, 20: 10155, 21: 10156, 22: 10157, 23: 10161, 24: 10181, 25: 10182, 26: 10147, 27: 10158}, 'C': {0: 45, 1: 45, 2: 45, 3: 45, 4: 45, 5: 45, 6: 45, 7: 45, 8: 45, 9: 45, 10: 45, 11: 45, 12: 45, 13: 23, 14: 23, 15: 23, 16: 23, 17: 23, 18: 23, 19: 23, 20: 23, 21: 23, 22: 23, 23: 23, 24: 23, 25: 23, 26: 30, 27: 30}, 'B': {0: 'KMPV', 1: 'KMPV', 2: 'KMPV', 3: 'KMPV', 4: 'KMPV', 5: 'KMPV', 6: 'KMPV', 7: 'KMPV', 8: 'KMPV', 9: 'KMPV', 10: 'KMPV', 11: 'KMPV', 12: 'KMPV', 13: 'KMVL', 14: 'KMVL', 15: 'KMVL', 16: 'KMVL', 17: 'KMVL', 18: 'KMVL', 19: 'KMVL', 20: 'KMVL', 21: 'KMVL', 22: 'KMVL', 23: 'KMVL', 24: 'KMVL', 25: 'KMVL', 26: 'K1V4', 27: 'K1V4'}, 'D': {0: 56, 1: 56, 2: 56, 3: 56, 4: 56, 5: 56, 6: 56, 7: 56, 8: 56, 9: 56, 10: 56, 11: 56, 12: 56, 13: 56, 14: 56, 15: 56, 16: 56, 17: 56, 18: 56, 19: 56, 20: 101, 21: 101, 22: 101, 23: 101, 24: 101, 25: 101, 26: 101, 27: 101}}
import pandas as pd
df = pd.read_csv('so.csv')
df_ = df.groupby(['D'])
# Result of above line is a tuple.
for g, v in df_:
# Get max C of each group. Then get column B.
x = v[v['C'] == v['C'].max()]['B'].iat[0]
df.loc[df['D'] == g,'newval'] = x
print df
A B C D newval
0 5641 KMPV 45 56 KMPV
...
13 5662 KMVL 23 56 KMPV
...
20 10155 KMVL 23 101 K1V4
...
26 10147 K1V4 30 101 K1V4
27 10158 K1V4 30 101 K1V4
结果:
{'A': {0: 5641, 1: 5649, 2: 5650, 3: 5651, 4: 5653, 5: 5678, 6: 5654, 7: 5609, 8: 5620, 9: 5633, 10: 5601, 11: 5602, 12: 5603, 13: 5662, 14: 5665, 15: 5672, 16: 5677, 17: 5648, 18: 5652, 19: 5680, 20: 10155, 21: 10156, 22: 10157, 23: 10161, 24: 10181, 25: 10182, 26: 10147, 27: 10158}, 'C': {0: 45, 1: 45, 2: 45, 3: 45, 4: 45, 5: 45, 6: 45, 7: 45, 8: 45, 9: 45, 10: 45, 11: 45, 12: 45, 13: 23, 14: 23, 15: 23, 16: 23, 17: 23, 18: 23, 19: 23, 20: 23, 21: 23, 22: 23, 23: 23, 24: 23, 25: 23, 26: 30, 27: 30}, 'B': {0: 'KMPV', 1: 'KMPV', 2: 'KMPV', 3: 'KMPV', 4: 'KMPV', 5: 'KMPV', 6: 'KMPV', 7: 'KMPV', 8: 'KMPV', 9: 'KMPV', 10: 'KMPV', 11: 'KMPV', 12: 'KMPV', 13: 'KMVL', 14: 'KMVL', 15: 'KMVL', 16: 'KMVL', 17: 'KMVL', 18: 'KMVL', 19: 'KMVL', 20: 'KMVL', 21: 'KMVL', 22: 'KMVL', 23: 'KMVL', 24: 'KMVL', 25: 'KMVL', 26: 'K1V4', 27: 'K1V4'}, 'D': {0: 56, 1: 56, 2: 56, 3: 56, 4: 56, 5: 56, 6: 56, 7: 56, 8: 56, 9: 56, 10: 56, 11: 56, 12: 56, 13: 56, 14: 56, 15: 56, 16: 56, 17: 56, 18: 56, 19: 56, 20: 101, 21: 101, 22: 101, 23: 101, 24: 101, 25: 101, 26: 101, 27: 101}}
import pandas as pd
df = pd.read_csv('so.csv')
df_ = df.groupby(['D'])
# Result of above line is a tuple.
for g, v in df_:
# Get max C of each group. Then get column B.
x = v[v['C'] == v['C'].max()]['B'].iat[0]
df.loc[df['D'] == g,'newval'] = x
print df
A B C D newval
0 5641 KMPV 45 56 KMPV
...
13 5662 KMVL 23 56 KMPV
...
20 10155 KMVL 23 101 K1V4
...
26 10147 K1V4 30 101 K1V4
27 10158 K1V4 30 101 K1V4
我认为这里的想法是使用
pd.merge
将原始数据中所有必需的列连接回分组数据帧。