Pandas-基于许多聚合函数添加许多新列
熊猫1.0.5Pandas-基于许多聚合函数添加许多新列,pandas,pandas-groupby,Pandas,Pandas Groupby,熊猫1.0.5 import pandas as pd d = pd.DataFrame({ "card_id": [1, 1, 2, 2, 1, 1, 2, 2], "day": [1, 1, 1, 1, 2, 2, 2, 2], "amount": [1, 2, 10, 20, 3, 4, 30, 40] }) #add columns d['count'] = d.groupby(['card_id'
import pandas as pd
d = pd.DataFrame({
"card_id": [1, 1, 2, 2, 1, 1, 2, 2],
"day": [1, 1, 1, 1, 2, 2, 2, 2],
"amount": [1, 2, 10, 20, 3, 4, 30, 40]
})
#add columns
d['count'] = d.groupby(['card_id', 'day'])["amount"].transform('count')
d['min'] = d.groupby(['card_id', 'day'])["amount"].transform('min')
d['max'] = d.groupby(['card_id', 'day'])["amount"].transform('max')
我想将三条变换线更改为一条线。我试过这个:
d['count', 'min', 'max'] = d.groupby(['card_id', 'day'])["amount"].transform('count', 'min', 'max')
d[('count', 'min', 'max')] = d.groupby(['card_id', 'day']).agg(
count = pd.NamedAgg('amount', 'count')
,min = pd.NamedAgg('amount', 'min')
,max = pd.NamedAgg('amount', 'max')
)
错误:“TypeError:count()接受1个位置参数,但给出了3个”
我也试过:
d['count', 'min', 'max'] = d.groupby(['card_id', 'day'])["amount"].transform('count', 'min', 'max')
d[('count', 'min', 'max')] = d.groupby(['card_id', 'day']).agg(
count = pd.NamedAgg('amount', 'count')
,min = pd.NamedAgg('amount', 'min')
,max = pd.NamedAgg('amount', 'max')
)
错误:“TypeError:插入列的索引与框架索引不兼容”使用合并
d = pd.DataFrame({
"card_id": [1, 1, 2, 2, 1, 1, 2, 2],
"day": [1, 1, 1, 1, 2, 2, 2, 2],
"amount": [1, 2, 10, 20, 3, 4, 30, 40]
})
df_out = d.groupby(['card_id', 'day']).agg(
count = pd.NamedAgg('amount', 'count')
,min = pd.NamedAgg('amount', 'min')
,max = pd.NamedAgg('amount', 'max')
)
d.merge(df_out, left_on=['card_id', 'day'], right_index=True)
输出:
card_id day amount count min max
0 1 1 1 2 1 2
1 1 1 2 2 1 2
2 2 1 10 2 10 20
3 2 1 20 2 10 20
4 1 2 3 2 3 4
5 1 2 4 2 3 4
6 2 2 30 2 30 40
7 2 2 40 2 30 40
GroupBy的输出正在创建多级索引,而此输出的索引与d的索引不匹配,因此出现错误。但是,我们可以使用merge with column names和right_index=True将d中的列连接到组输出中的索引。您可以使用
assign
函数一次性获得结果:
grouping = df.groupby(["card_id", "day"])
df.assign(
count=grouping.transform("count"),
min=grouping.transform("min"),
max=grouping.transform("max"),
)
card_id day amount count min max
0 1 1 1 2 1 2
1 1 1 2 2 1 2
2 2 1 10 2 10 20
3 2 1 20 2 10 20
4 1 2 3 2 3 4
5 1 2 4 2 3 4
6 2 2 30 2 30 40
7 2 2 40 2 30 40