Python 如何基于其他/以前groupby的结果运行groupby？_Python_Pandas_Pandas Groupby

Python 如何基于其他/以前groupby的结果运行groupby？

python pandas

Python 如何基于其他/以前groupby的结果运行groupby？,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,假设您正在全球销售一种产品，并且希望在一个大城市的某个地方设立一个销售办事处。您的决定将完全基于销售数字这将是您的（简化）销售数据： df={ 'Product':'Chair', 'Country': ['USA','USA', 'China','China','China','China','India', 'India','India','India','India','India', 'India'], 'Region': ['USA_West','USA_East', 'China

假设您正在全球销售一种产品，并且希望在一个大城市的某个地方设立一个销售办事处。您的决定将完全基于销售数字

这将是您的（简化）销售数据：

df={
'Product':'Chair',
'Country': ['USA','USA', 'China','China','China','China','India', 
'India','India','India','India','India', 'India'],
'Region': ['USA_West','USA_East', 'China_West','China_East','China_South','China_South', 'India_North','India_North', 'India_North','India_West','India_West','India_East','India_South'],
'City': ['A','B', 'C','D','E', 'F', 'G','H','I', 'J','K', 'L', 'M'],
'Sales':[1000,1000, 1200,200,200, 200,500 ,350,350,100,700,50,50]  
}

dff=pd.DataFrame.from_dict(df)

dff

根据数据，你应该选择城市“G”

逻辑应该是这样的：

1）查找具有最大值（销售额）的国家/地区

2）在该国家/地区，查找最大（销售额）的地区

3）在该地区，找到具有Max（销售额）的城市

我尝试了：

groupby（'Product'，'City'）.apply（lambda x:x.nlargest（1））

，但这不起作用，因为它会建议使用城市“C”。这是全球销量最高的城市，但中国并不是销量最高的国家

我可能要经历几个groupby循环。根据结果，过滤原始数据帧，并在下一级再次执行groupby

为了增加复杂性，你还销售其他产品（不仅仅是“椅子”，还有其他家具）。您必须将每次迭代的结果存储在某个地方（比如每个产品的最大销售额国家），然后在groupby的下一次迭代中使用它

您有什么想法，我如何在pandas/python中实现这一点吗？

想法是每个级别的聚合

sum

，对于top1值，下一级别的过滤使用什么：

一种方法是添加分组总计，然后对数据帧进行排序。通过使用首选项逻辑对所有数据进行排序，这超出了您的要求：

df = pd.DataFrame.from_dict(df)

factors = ['Country', 'Region', 'City']
for factor in factors:
    df[f'{factor}_Total'] = df.groupby(factor)['Sales'].transform('sum')

res = df.sort_values([f'{x}_Total' for x in factors], ascending=False)

print(res.head(5))

   City Country Product       Region  Sales  Country_Total  Region_Total  \
6     G   India   Chair  India_North    500           2100          1200   
7     H   India   Chair  India_North    350           2100          1200   
8     I   India   Chair  India_North    350           2100          1200   
10    K   India   Chair   India_West    700           2100           800   
9     J   India   Chair   India_West    100           2100           800   

    City_Total  
6          500  
7          350  
8          350  
10         700  
9          100

因此，对于最理想的情况，您可以使用

res.iloc[0]

，对于第二个

res.iloc[1]

，等等。

效果非常好！谢谢你，耶斯雷尔！我不知道idxmax（）方法。

df = pd.DataFrame.from_dict(df)

factors = ['Country', 'Region', 'City']
for factor in factors:
    df[f'{factor}_Total'] = df.groupby(factor)['Sales'].transform('sum')

res = df.sort_values([f'{x}_Total' for x in factors], ascending=False)

print(res.head(5))

   City Country Product       Region  Sales  Country_Total  Region_Total  \
6     G   India   Chair  India_North    500           2100          1200   
7     H   India   Chair  India_North    350           2100          1200   
8     I   India   Chair  India_North    350           2100          1200   
10    K   India   Chair   India_West    700           2100           800   
9     J   India   Chair   India_West    100           2100           800   

    City_Total  
6          500  
7          350  
8          350  
10         700  
9          100