Python 熊猫：在聚合后维护列_Python_Pandas

Python 熊猫：在聚合后维护列

python pandas

Python 熊猫：在聚合后维护列,python,pandas,Python,Pandas,我的数据如下所示：用于构建它的代码如下所示： Data = pd.DataFrame({'Customer_ID':[1,2,3,4,5,1,2,3,4,5], 'Product_ID':['A','D','C','A','E','B','D','C','B','E'], 'SalesAmount':[12.34,13.55,34.00, 19.15,13.22,12.34,13.55,34.00, 19.15,13.2

我的数据如下所示：

用于构建它的代码如下所示：

  Data = pd.DataFrame({'Customer_ID':[1,2,3,4,5,1,2,3,4,5],
                 'Product_ID':['A','D','C','A','E','B','D','C','B','E'],
                 'SalesAmount':[12.34,13.55,34.00, 19.15,13.22,12.34,13.55,34.00, 19.15,13.22],
                     'ProductCost' : [12.34,13.55,34.00, 19.15,13.22,12.34,13.55,34.00, 19.15,13.22]})

 Data_aggr = Data.groupby('Customer_ID').agg({'SalesAmount':[min,sum],
                                         'ProductCost':['mean',max] },'Product_ID')

 Data_aggr.columns = ["_".join(x) for x in Data_aggr.columns.ravel()]

 Data_aggr.index.name='Customer_ID'

 Data_aggr.reset_index(inplace=True)
 Data_aggr

我的问题是，在聚合所需的列之后，如何维护列

在我的例子中，我希望在聚合后的数据中有列Product_ID。我用来聚合的代码和结果如下：

  Data = pd.DataFrame({'Customer_ID':[1,2,3,4,5,1,2,3,4,5],
                 'Product_ID':['A','D','C','A','E','B','D','C','B','E'],
                 'SalesAmount':[12.34,13.55,34.00, 19.15,13.22,12.34,13.55,34.00, 19.15,13.22],
                     'ProductCost' : [12.34,13.55,34.00, 19.15,13.22,12.34,13.55,34.00, 19.15,13.22]})

 Data_aggr = Data.groupby('Customer_ID').agg({'SalesAmount':[min,sum],
                                         'ProductCost':['mean',max] },'Product_ID')

 Data_aggr.columns = ["_".join(x) for x in Data_aggr.columns.ravel()]

 Data_aggr.index.name='Customer_ID'

 Data_aggr.reset_index(inplace=True)
 Data_aggr

结果:

我期望的输出是：

您需要聚合所有必要的列，例如，通过

首先

：

Data_aggr = Data.groupby('Customer_ID').agg({'SalesAmount':[min,sum],
                                         'ProductCost':['mean',max],'Product_ID':'first' })

Data_aggr.columns = ["_".join(x) for x in Data_aggr.columns.ravel()]

Data_aggr = Data_aggr.rename(columns={'Product_ID_first':'Product_ID'}).reset_index()
print (Data_aggr)
   Customer_ID  SalesAmount_min  SalesAmount_sum  ProductCost_mean  \
0            1            12.34            24.68             12.34   
1            2            13.55            27.10             13.55   
2            3            34.00            68.00             34.00   
3            4            19.15            38.30             19.15   
4            5            13.22            26.44             13.22   

   ProductCost_max Product_ID  
0            12.34          A  
1            13.55          D  
2            34.00          C  
3            19.15          A  
4            13.22          E

或按多列分组，但输出不同：

Data_aggr = Data.groupby(['Customer_ID','Product_ID']).agg({'SalesAmount':[min,sum],
                                         'ProductCost':['mean',max]})

Data_aggr.columns = ["_".join(x) for x in Data_aggr.columns.ravel()]

Data_aggr = Data_aggr.reset_index()
print (Data_aggr)
   Customer_ID Product_ID  SalesAmount_min  SalesAmount_sum  ProductCost_mean  \
0            1          A            12.34            12.34             12.34   
1            1          B            12.34            12.34             12.34   
2            2          D            13.55            27.10             13.55   
3            3          C            34.00            68.00             34.00   
4            4          A            19.15            19.15             19.15   
5            4          B            19.15            19.15             19.15   
6            5          E            13.22            26.44             13.22   

   ProductCost_max  
0            12.34  
1            12.34  
2            13.55  
3            34.00  
4            19.15  
5            19.15  
6            13.22

您需要聚合所有必要的列，例如，通过

首先

：

Data_aggr = Data.groupby('Customer_ID').agg({'SalesAmount':[min,sum],
                                         'ProductCost':['mean',max],'Product_ID':'first' })

Data_aggr.columns = ["_".join(x) for x in Data_aggr.columns.ravel()]

Data_aggr = Data_aggr.rename(columns={'Product_ID_first':'Product_ID'}).reset_index()
print (Data_aggr)
   Customer_ID  SalesAmount_min  SalesAmount_sum  ProductCost_mean  \
0            1            12.34            24.68             12.34   
1            2            13.55            27.10             13.55   
2            3            34.00            68.00             34.00   
3            4            19.15            38.30             19.15   
4            5            13.22            26.44             13.22   

   ProductCost_max Product_ID  
0            12.34          A  
1            13.55          D  
2            34.00          C  
3            19.15          A  
4            13.22          E

或按多列分组，但输出不同：

Data_aggr = Data.groupby(['Customer_ID','Product_ID']).agg({'SalesAmount':[min,sum],
                                         'ProductCost':['mean',max]})

Data_aggr.columns = ["_".join(x) for x in Data_aggr.columns.ravel()]

Data_aggr = Data_aggr.reset_index()
print (Data_aggr)
   Customer_ID Product_ID  SalesAmount_min  SalesAmount_sum  ProductCost_mean  \
0            1          A            12.34            12.34             12.34   
1            1          B            12.34            12.34             12.34   
2            2          D            13.55            27.10             13.55   
3            3          C            34.00            68.00             34.00   
4            4          A            19.15            19.15             19.15   
5            4          B            19.15            19.15             19.15   
6            5          E            13.22            26.44             13.22   

   ProductCost_max  
0            12.34  
1            12.34  
2            13.55  
3            34.00  
4            19.15  
5            19.15  
6            13.22

请显示您想要的输出（而不仅仅是描述）

Data.groupby（['Customer\u ID'，'Product\u ID']）.agg…

似乎工作正常，但我无法判断这是否是您想要的。我编辑了上面的问题。我试过你说的方式，但给了我一个错误；ValueError:对象类型没有命名为Product\u ID的轴，这是因为您在输入中拼写错误了

prodcut\u ID

。。。应为

Product\u ID

。未命名轴。。。可能是因为您没有按说明传递列表：

.groupby（['Customer\u ID'，'Product\u ID'）

。你能检查一下括号吗？我想你忘记了

数据中的[]
。groupby（['Customer\u ID'，'Product\u ID'））

请显示你想要的输出（而不仅仅是描述）

Data.groupby（['Customer\u ID'，'Product\u ID']）.agg…

prodcut\u ID

。。。应为

Product\u ID

。未命名轴。。。可能是因为您没有按说明传递列表：

.groupby（['Customer\u ID'，'Product\u ID'）

。你能检查一下括号吗？我想你忘记了

数据中的[]
。groupby（['Customer\u ID'，'Product\u ID'））

谢谢你的回复。我并不是在图中所示的列中查找值。图片被编辑以显示我想要的输出。我正在寻找一个按产品ID和客户ID分组的列。@L.G.-第一个解决方案类似于图片中的需要，第二个类似于jpp建议，但输出不同。@L.G.-输出不同，因为如果在groupby中使用多个列，如

Data.groupby（['Customer\u ID'，'product\u ID'））

it通过指定列的组合进行分组，因此应该有此输出。@jezreal得到了答案。无论如何，谢谢你们的帮助。@L.G.-好的，答案和我的2个解决方案不同？谢谢你们的回复。我并不是在图中所示的列中查找值。图片被编辑以显示我想要的输出。我正在寻找一个按产品ID和客户ID分组的列。@L.G.-第一个解决方案类似于图片中的需要，第二个类似于jpp建议，但输出不同。@L.G.-输出不同，因为如果在groupby中使用多个列，如

Data.groupby（['Customer\u ID'，'product\u ID'））

it通过指定列的组合进行分组，因此应该有此输出。@jezreal得到了答案。无论如何，谢谢你们的帮助。@L.G.-好的，答案和我的2个解决方案不同吗？