Python 分组后如何计算自定义聚合_Python_Pandas_Numpy_Pandas Groupby_Aggregate

Python 分组后如何计算自定义聚合

python pandas numpy

Python 分组后如何计算自定义聚合,python,pandas,numpy,pandas-groupby,aggregate,Python,Pandas,Numpy,Pandas Groupby,Aggregate,我有如下数据帧df ID COMMODITY_CODE DELIVERY_TYPE DAY Window_start_time case_qty deliveries 6042.0 SCGR Live 1.0 15:00 15756.75 7.75 6042.0 SCGR Live 1.0 18:00 15

我有如下数据帧df

ID   COMMODITY_CODE   DELIVERY_TYPE  DAY   Window_start_time     case_qty     deliveries
6042.0      SCGR        Live         1.0    15:00                 15756.75    7.75
6042.0      SCGR        Live         1.0    18:00                 15787.75    5.75
6042.0      SCGR        Live         1.0    21:00                 10989.75    4.75
6042.0      SCGR        Live         2.0    15:00                 21025.25    9.00
6042.0      SCGR        Live         2.0    18:00                 16041.75    5.75

ID   COMMODITY_CODE   DELIVERY_TYPE  DAY  case_qty   deliveries dlvry_ratio case_qty_ratio
6042.0      SCGR        Live         1.0.  15756.75   7.75         0.42          0.37
6042.0      SCGR        Live         1.0.  15787.75   5.75.        0.31.         0.37
6042.0      SCGR        Live         1.0.  10989.75   4.75.        0.26.         0.25
6042.0      SCGR        Live         2.0.  21025.25   9.00.        0.61.         0.56
6042.0      SCGR        Live         2.0.  16041.75   5.75.        0.39          0.44

我想在下面的输出中按ID、商品代码、交货类型、日期分组，并计算下面的案例数量比率和dlvry比率，如下所示

ID   COMMODITY_CODE   DELIVERY_TYPE  DAY   Window_start_time     case_qty     deliveries
6042.0      SCGR        Live         1.0    15:00                 15756.75    7.75
6042.0      SCGR        Live         1.0    18:00                 15787.75    5.75
6042.0      SCGR        Live         1.0    21:00                 10989.75    4.75
6042.0      SCGR        Live         2.0    15:00                 21025.25    9.00
6042.0      SCGR        Live         2.0    18:00                 16041.75    5.75

ID   COMMODITY_CODE   DELIVERY_TYPE  DAY  case_qty   deliveries dlvry_ratio case_qty_ratio
6042.0      SCGR        Live         1.0.  15756.75   7.75         0.42          0.37
6042.0      SCGR        Live         1.0.  15787.75   5.75.        0.31.         0.37
6042.0      SCGR        Live         1.0.  10989.75   4.75.        0.26.         0.25
6042.0      SCGR        Live         2.0.  21025.25   9.00.        0.61.         0.56
6042.0      SCGR        Live         2.0.  16041.75   5.75.        0.39          0.44

我尝试了下面的代码，使用lambda函数来聚合这些信息

df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY']  \
                        ,as_index=False) \
                        .agg( \
                             delivery_ratio=("deliveries",lambda x: x / x.sum()), \
                             case_ratio=(lambda x: x/ x.sum() ) /

但这不起作用。任何帮助都将不胜感激

请尝试以下方式：

df[['case_ratio', 'delivery_ratio']] = df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY'], 
                                                   as_index=False)[['case_qty', 'deliveries']]\
                                          .transform(lambda x: x/x.sum())

输出：

       ID COMMODITY_CODE DELIVERY_TYPE  DAY Window_start_time  case_qty  deliveries  case_ratio   delivery_ratio
0  6042.0           SCGR          Live  1.0             15:00  15756.75        7.75     0.370449        0.424658
1  6042.0           SCGR          Live  1.0             18:00  15787.75        5.75     0.371177        0.315068
2  6042.0           SCGR          Live  1.0             21:00  10989.75        4.75     0.258374        0.260274
3  6042.0           SCGR          Live  2.0             15:00  21025.25        9.00     0.567223        0.610169
4  6042.0           SCGR          Live  2.0             18:00  16041.75        5.75     0.432777        0.389831

       ID COMMODITY_CODE DELIVERY_TYPE  DAY Window_start_time  case_qty  \
0  6042.0           SCGR          Live  1.0             15:00  15756.75   
1  6042.0           SCGR          Live  1.0             18:00  15787.75   
2  6042.0           SCGR          Live  1.0             21:00  10989.75   
3  6042.0           SCGR          Live  2.0             15:00  21025.25   
4  6042.0           SCGR          Live  2.0             18:00  16041.75   

   deliveries  case_qty_ratio  deliveries_ratio  
0        7.75        0.370449          0.424658  
1        5.75        0.371177          0.315068  
2        4.75        0.258374          0.260274  
3        9.00        0.567223          0.610169  
4        5.75        0.432777          0.389831

请尝试以下方法：

df[['case_ratio', 'delivery_ratio']] = df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY'], 
                                                   as_index=False)[['case_qty', 'deliveries']]\
                                          .transform(lambda x: x/x.sum())

输出：

       ID COMMODITY_CODE DELIVERY_TYPE  DAY Window_start_time  case_qty  deliveries  case_ratio   delivery_ratio
0  6042.0           SCGR          Live  1.0             15:00  15756.75        7.75     0.370449        0.424658
1  6042.0           SCGR          Live  1.0             18:00  15787.75        5.75     0.371177        0.315068
2  6042.0           SCGR          Live  1.0             21:00  10989.75        4.75     0.258374        0.260274
3  6042.0           SCGR          Live  2.0             15:00  21025.25        9.00     0.567223        0.610169
4  6042.0           SCGR          Live  2.0             18:00  16041.75        5.75     0.432777        0.389831

       ID COMMODITY_CODE DELIVERY_TYPE  DAY Window_start_time  case_qty  \
0  6042.0           SCGR          Live  1.0             15:00  15756.75   
1  6042.0           SCGR          Live  1.0             18:00  15787.75   
2  6042.0           SCGR          Live  1.0             21:00  10989.75   
3  6042.0           SCGR          Live  2.0             15:00  21025.25   
4  6042.0           SCGR          Live  2.0             18:00  16041.75   

   deliveries  case_qty_ratio  deliveries_ratio  
0        7.75        0.370449          0.424658  
1        5.75        0.371177          0.315068  
2        4.75        0.258374          0.260274  
3        9.00        0.567223          0.610169  
4        5.75        0.432777          0.389831

与Scott的答案类似，但只需转换“和”，然后除以：

cols = ['case_qty', 'deliveries']
df = df.join(df[cols].div(df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY'])
                            [cols].transform('sum')
                         )
                     .add_suffix('_ratio')
            )

输出：

       ID COMMODITY_CODE DELIVERY_TYPE  DAY Window_start_time  case_qty  deliveries  case_ratio   delivery_ratio
0  6042.0           SCGR          Live  1.0             15:00  15756.75        7.75     0.370449        0.424658
1  6042.0           SCGR          Live  1.0             18:00  15787.75        5.75     0.371177        0.315068
2  6042.0           SCGR          Live  1.0             21:00  10989.75        4.75     0.258374        0.260274
3  6042.0           SCGR          Live  2.0             15:00  21025.25        9.00     0.567223        0.610169
4  6042.0           SCGR          Live  2.0             18:00  16041.75        5.75     0.432777        0.389831

       ID COMMODITY_CODE DELIVERY_TYPE  DAY Window_start_time  case_qty  \
0  6042.0           SCGR          Live  1.0             15:00  15756.75   
1  6042.0           SCGR          Live  1.0             18:00  15787.75   
2  6042.0           SCGR          Live  1.0             21:00  10989.75   
3  6042.0           SCGR          Live  2.0             15:00  21025.25   
4  6042.0           SCGR          Live  2.0             18:00  16041.75   

   deliveries  case_qty_ratio  deliveries_ratio  
0        7.75        0.370449          0.424658  
1        5.75        0.371177          0.315068  
2        4.75        0.258374          0.260274  
3        9.00        0.567223          0.610169  
4        5.75        0.432777          0.389831

与Scott的答案类似，但只需转换“和”，然后除以：

cols = ['case_qty', 'deliveries']
df = df.join(df[cols].div(df.groupby(['ID','COMMODITY_CODE','DELIVERY_TYPE','DAY'])
                            [cols].transform('sum')
                         )
                     .add_suffix('_ratio')
            )

输出：

       ID COMMODITY_CODE DELIVERY_TYPE  DAY Window_start_time  case_qty  deliveries  case_ratio   delivery_ratio
0  6042.0           SCGR          Live  1.0             15:00  15756.75        7.75     0.370449        0.424658
1  6042.0           SCGR          Live  1.0             18:00  15787.75        5.75     0.371177        0.315068
2  6042.0           SCGR          Live  1.0             21:00  10989.75        4.75     0.258374        0.260274
3  6042.0           SCGR          Live  2.0             15:00  21025.25        9.00     0.567223        0.610169
4  6042.0           SCGR          Live  2.0             18:00  16041.75        5.75     0.432777        0.389831

       ID COMMODITY_CODE DELIVERY_TYPE  DAY Window_start_time  case_qty  \
0  6042.0           SCGR          Live  1.0             15:00  15756.75   
1  6042.0           SCGR          Live  1.0             18:00  15787.75   
2  6042.0           SCGR          Live  1.0             21:00  10989.75   
3  6042.0           SCGR          Live  2.0             15:00  21025.25   
4  6042.0           SCGR          Live  2.0             18:00  16041.75   

   deliveries  case_qty_ratio  deliveries_ratio  
0        7.75        0.370449          0.424658  
1        5.75        0.371177          0.315068  
2        4.75        0.258374          0.260274  
3        9.00        0.567223          0.610169  
4        5.75        0.432777          0.389831