Python 分组后如何计算自定义聚合
我有如下数据帧dfPython 分组后如何计算自定义聚合,python,pandas,numpy,pandas-groupby,aggregate,Python,Pandas,Numpy,Pandas Groupby,Aggregate,我有如下数据帧df ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start_time case_qty deliveries 6042.0 SCGR Live 1.0 15:00 15756.75 7.75 6042.0 SCGR Live 1.0 18:00 15
ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start_time case_qty deliveries
6042.0 SCGR Live 1.0 15:00 15756.75 7.75
6042.0 SCGR Live 1.0 18:00 15787.75 5.75
6042.0 SCGR Live 1.0 21:00 10989.75 4.75
6042.0 SCGR Live 2.0 15:00 21025.25 9.00
6042.0 SCGR Live 2.0 18:00 16041.75 5.75
ID COMMODITY_CODE DELIVERY_TYPE DAY case_qty deliveries dlvry_ratio case_qty_ratio
6042.0 SCGR Live 1.0. 15756.75 7.75 0.42 0.37
6042.0 SCGR Live 1.0. 15787.75 5.75. 0.31. 0.37
6042.0 SCGR Live 1.0. 10989.75 4.75. 0.26. 0.25
6042.0 SCGR Live 2.0. 21025.25 9.00. 0.61. 0.56
6042.0 SCGR Live 2.0. 16041.75 5.75. 0.39 0.44
我想在下面的输出中按ID、商品代码、交货类型、日期分组,并计算下面的案例数量比率和dlvry比率,如下所示
ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start_time case_qty deliveries
6042.0 SCGR Live 1.0 15:00 15756.75 7.75
6042.0 SCGR Live 1.0 18:00 15787.75 5.75
6042.0 SCGR Live 1.0 21:00 10989.75 4.75
6042.0 SCGR Live 2.0 15:00 21025.25 9.00
6042.0 SCGR Live 2.0 18:00 16041.75 5.75
ID COMMODITY_CODE DELIVERY_TYPE DAY case_qty deliveries dlvry_ratio case_qty_ratio
6042.0 SCGR Live 1.0. 15756.75 7.75 0.42 0.37
6042.0 SCGR Live 1.0. 15787.75 5.75. 0.31. 0.37
6042.0 SCGR Live 1.0. 10989.75 4.75. 0.26. 0.25
6042.0 SCGR Live 2.0. 21025.25 9.00. 0.61. 0.56
6042.0 SCGR Live 2.0. 16041.75 5.75. 0.39 0.44
我尝试了下面的代码,使用lambda函数来聚合这些信息
df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY'] \
,as_index=False) \
.agg( \
delivery_ratio=("deliveries",lambda x: x / x.sum()), \
case_ratio=(lambda x: x/ x.sum() ) /
但这不起作用。任何帮助都将不胜感激请尝试以下方式:
df[['case_ratio', 'delivery_ratio']] = df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY'],
as_index=False)[['case_qty', 'deliveries']]\
.transform(lambda x: x/x.sum())
输出:
ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start_time case_qty deliveries case_ratio delivery_ratio
0 6042.0 SCGR Live 1.0 15:00 15756.75 7.75 0.370449 0.424658
1 6042.0 SCGR Live 1.0 18:00 15787.75 5.75 0.371177 0.315068
2 6042.0 SCGR Live 1.0 21:00 10989.75 4.75 0.258374 0.260274
3 6042.0 SCGR Live 2.0 15:00 21025.25 9.00 0.567223 0.610169
4 6042.0 SCGR Live 2.0 18:00 16041.75 5.75 0.432777 0.389831
ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start_time case_qty \
0 6042.0 SCGR Live 1.0 15:00 15756.75
1 6042.0 SCGR Live 1.0 18:00 15787.75
2 6042.0 SCGR Live 1.0 21:00 10989.75
3 6042.0 SCGR Live 2.0 15:00 21025.25
4 6042.0 SCGR Live 2.0 18:00 16041.75
deliveries case_qty_ratio deliveries_ratio
0 7.75 0.370449 0.424658
1 5.75 0.371177 0.315068
2 4.75 0.258374 0.260274
3 9.00 0.567223 0.610169
4 5.75 0.432777 0.389831
请尝试以下方法:
df[['case_ratio', 'delivery_ratio']] = df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY'],
as_index=False)[['case_qty', 'deliveries']]\
.transform(lambda x: x/x.sum())
输出:
ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start_time case_qty deliveries case_ratio delivery_ratio
0 6042.0 SCGR Live 1.0 15:00 15756.75 7.75 0.370449 0.424658
1 6042.0 SCGR Live 1.0 18:00 15787.75 5.75 0.371177 0.315068
2 6042.0 SCGR Live 1.0 21:00 10989.75 4.75 0.258374 0.260274
3 6042.0 SCGR Live 2.0 15:00 21025.25 9.00 0.567223 0.610169
4 6042.0 SCGR Live 2.0 18:00 16041.75 5.75 0.432777 0.389831
ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start_time case_qty \
0 6042.0 SCGR Live 1.0 15:00 15756.75
1 6042.0 SCGR Live 1.0 18:00 15787.75
2 6042.0 SCGR Live 1.0 21:00 10989.75
3 6042.0 SCGR Live 2.0 15:00 21025.25
4 6042.0 SCGR Live 2.0 18:00 16041.75
deliveries case_qty_ratio deliveries_ratio
0 7.75 0.370449 0.424658
1 5.75 0.371177 0.315068
2 4.75 0.258374 0.260274
3 9.00 0.567223 0.610169
4 5.75 0.432777 0.389831
与Scott的答案类似,但只需转换“和”,然后除以:
cols = ['case_qty', 'deliveries']
df = df.join(df[cols].div(df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY'])
[cols].transform('sum')
)
.add_suffix('_ratio')
)
输出:
ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start_time case_qty deliveries case_ratio delivery_ratio
0 6042.0 SCGR Live 1.0 15:00 15756.75 7.75 0.370449 0.424658
1 6042.0 SCGR Live 1.0 18:00 15787.75 5.75 0.371177 0.315068
2 6042.0 SCGR Live 1.0 21:00 10989.75 4.75 0.258374 0.260274
3 6042.0 SCGR Live 2.0 15:00 21025.25 9.00 0.567223 0.610169
4 6042.0 SCGR Live 2.0 18:00 16041.75 5.75 0.432777 0.389831
ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start_time case_qty \
0 6042.0 SCGR Live 1.0 15:00 15756.75
1 6042.0 SCGR Live 1.0 18:00 15787.75
2 6042.0 SCGR Live 1.0 21:00 10989.75
3 6042.0 SCGR Live 2.0 15:00 21025.25
4 6042.0 SCGR Live 2.0 18:00 16041.75
deliveries case_qty_ratio deliveries_ratio
0 7.75 0.370449 0.424658
1 5.75 0.371177 0.315068
2 4.75 0.258374 0.260274
3 9.00 0.567223 0.610169
4 5.75 0.432777 0.389831
与Scott的答案类似,但只需转换“和”,然后除以:
cols = ['case_qty', 'deliveries']
df = df.join(df[cols].div(df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY'])
[cols].transform('sum')
)
.add_suffix('_ratio')
)
输出:
ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start_time case_qty deliveries case_ratio delivery_ratio
0 6042.0 SCGR Live 1.0 15:00 15756.75 7.75 0.370449 0.424658
1 6042.0 SCGR Live 1.0 18:00 15787.75 5.75 0.371177 0.315068
2 6042.0 SCGR Live 1.0 21:00 10989.75 4.75 0.258374 0.260274
3 6042.0 SCGR Live 2.0 15:00 21025.25 9.00 0.567223 0.610169
4 6042.0 SCGR Live 2.0 18:00 16041.75 5.75 0.432777 0.389831
ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start_time case_qty \
0 6042.0 SCGR Live 1.0 15:00 15756.75
1 6042.0 SCGR Live 1.0 18:00 15787.75
2 6042.0 SCGR Live 1.0 21:00 10989.75
3 6042.0 SCGR Live 2.0 15:00 21025.25
4 6042.0 SCGR Live 2.0 18:00 16041.75
deliveries case_qty_ratio deliveries_ratio
0 7.75 0.370449 0.424658
1 5.75 0.371177 0.315068
2 4.75 0.258374 0.260274
3 9.00 0.567223 0.610169
4 5.75 0.432777 0.389831