Pandas-基于许多聚合函数添加许多新列

Pandas-基于许多聚合函数添加许多新列,pandas,pandas-groupby,Pandas,Pandas Groupby,熊猫1.0.5 import pandas as pd d = pd.DataFrame({ "card_id": [1, 1, 2, 2, 1, 1, 2, 2], "day": [1, 1, 1, 1, 2, 2, 2, 2], "amount": [1, 2, 10, 20, 3, 4, 30, 40] }) #add columns d['count'] = d.groupby(['card_id'

熊猫1.0.5

import pandas as pd
d = pd.DataFrame({
    "card_id": [1, 1, 2, 2, 1, 1, 2, 2],
    "day": [1, 1, 1, 1, 2, 2, 2, 2],
    "amount": [1, 2, 10, 20, 3, 4, 30, 40]
  })

#add columns
d['count'] = d.groupby(['card_id', 'day'])["amount"].transform('count')
d['min'] = d.groupby(['card_id', 'day'])["amount"].transform('min')
d['max'] = d.groupby(['card_id', 'day'])["amount"].transform('max')
我想将三条变换线更改为一条线。我试过这个:

d['count', 'min', 'max'] = d.groupby(['card_id', 'day'])["amount"].transform('count', 'min', 'max')
d[('count', 'min', 'max')] = d.groupby(['card_id', 'day']).agg(
    count = pd.NamedAgg('amount', 'count')
    ,min = pd.NamedAgg('amount', 'min')
    ,max = pd.NamedAgg('amount', 'max')
)
错误:“TypeError:count()接受1个位置参数,但给出了3个”

我也试过:

d['count', 'min', 'max'] = d.groupby(['card_id', 'day'])["amount"].transform('count', 'min', 'max')
d[('count', 'min', 'max')] = d.groupby(['card_id', 'day']).agg(
    count = pd.NamedAgg('amount', 'count')
    ,min = pd.NamedAgg('amount', 'min')
    ,max = pd.NamedAgg('amount', 'max')
)
错误:“TypeError:插入列的索引与框架索引不兼容”

使用合并

d = pd.DataFrame({
    "card_id": [1, 1, 2, 2, 1, 1, 2, 2],
    "day": [1, 1, 1, 1, 2, 2, 2, 2],
    "amount": [1, 2, 10, 20, 3, 4, 30, 40]
  })

df_out = d.groupby(['card_id', 'day']).agg(
    count = pd.NamedAgg('amount', 'count')
    ,min = pd.NamedAgg('amount', 'min')
    ,max = pd.NamedAgg('amount', 'max')
)

d.merge(df_out, left_on=['card_id', 'day'], right_index=True)
输出:

   card_id  day  amount  count  min  max
0        1    1       1      2    1    2
1        1    1       2      2    1    2
2        2    1      10      2   10   20
3        2    1      20      2   10   20
4        1    2       3      2    3    4
5        1    2       4      2    3    4
6        2    2      30      2   30   40
7        2    2      40      2   30   40

GroupBy的输出正在创建多级索引,而此输出的索引与d的索引不匹配,因此出现错误。但是,我们可以使用merge with column names和right_index=True将d中的列连接到组输出中的索引。

您可以使用
assign
函数一次性获得结果:

grouping = df.groupby(["card_id", "day"])
df.assign(
    count=grouping.transform("count"),
    min=grouping.transform("min"),
    max=grouping.transform("max"),
)


 card_id    day amount  count   min max
0   1       1   1       2       1   2
1   1       1   2       2       1   2
2   2       1   10      2       10  20
3   2       1   20      2       10  20
4   1       2   3       2       3   4
5   1       2   4       2       3   4
6   2       2   30      2       30  40
7   2       2   40      2       30  40