Python 3.x pandas groupby子组的频率计算、插入新行和重新排列列

Python 3.x pandas groupby子组的频率计算、插入新行和重新排列列,python-3.x,pandas,group-by,transform,frequency,Python 3.x,Pandas,Group By,Transform,Frequency,我需要一些在子组上执行一些操作的帮助,但我真的很困惑。我将尝试用注释快速描述操作和所需的输出 (1) 计算每个子组的出现频率百分比 (2) 显示与0不存在的记录 (3) 重新排列记录和列的顺序 假设以下df为原始数据: df=pd.DataFrame({'store':[1,1,1,2,2,2,3,3,3,3], 'branch':['A','A','C','C','C','C','A','A','C','A'], 'produ


(1) 计算每个子组的出现频率百分比

(2) 显示与0不存在的记录

(3) 重新排列记录和列的顺序


                 'products':['clothes', 'shoes', 'clothes', 'shoes', 'accessories', 'clothes', 'bags', 'bags', 'clothes', 'clothes']})

grouped_df=df.groupby(['store', 'branch', 'products']).size().unstack('products').replace({np.nan:0})

# output:
products      accessories  bags  clothes  shoes
store branch                                   
1     A               0.0   0.0      1.0    1.0
      C               0.0   0.0      1.0    0.0
2     C               1.0   0.0      1.0    1.0
3     A               0.0   2.0      1.0    0.0
      C               0.0   0.0      1.0    0.0

# desirable output: if (1), (2) and (3) take place somehow...
products      clothes  shoes  accessories  bags
store branch                                   
1     B             0      0            0     0  #group 1 has 1 shoes and 1 clothes for A and C, so 3 in total which transforms each number to 33.3%
      A          33.3   33.3            0     0
      C          33.3    0.0            0     0
2     B             0      0            0     0
      A             0      0            0     0
      C          33.3   33.3         33.3     0
3     B             0      0            0     0  #group 3 has 2 bags and 1 clothes for A and C, so 4 in total which transforms the 2 bags into 50% and so on
      A            25      0            0    50
      C            25      0            0     0
# (3) rearrangement of columns with "clothes" and "shoes" going first
# (3)+(2) branch B appeared and the the order of branches changed to B, A, C
# (1) percentage calculations of the occurrences have been performed over groups that hopefully have made sense with the comments above

grouped_df.loc[[1]].transform(lambda x: x*100/sum(x)).round(0)
products      accessories  bags  clothes  shoes
store branch                                   
1     A               NaN   NaN     50.0  100.0  #why has it transformed on axis='columns'?
      C               NaN   NaN     50.0    0.0



grouped_df = df.groupby(['store', 'branch', 'products']).size()\
        .reindex(['B','C','A'], axis=1, fill_value=0)\
                            lambda x: x*100/df.groupby(['store']).size()
                            .reindex(['clothes', 'shoes', 'accessories', 'bags'], axis='columns')

products      accessories  bags  clothes  shoes
store branch                                   
1     B               0.0   0.0      0.0    0.0
      C               0.0   0.0     33.3    0.0
      A               0.0   0.0     33.3   33.3
2     B               0.0   0.0      0.0    0.0
      C              33.3   0.0     33.3   33.3
3     B               0.0   0.0      0.0    0.0
      C               0.0   0.0     25.0    0.0
      A               0.0  50.0     25.0    0.0


grouped_df = df.groupby(['store', 'branch', 'products']).size()\
        .reindex(['B','C','A'], axis=1, fill_value=0)\
                            lambda x: x*100/df.groupby(['store']).size()
                            .reindex(['clothes', 'shoes', 'accessories', 'bags'], axis='columns')

products      accessories  bags  clothes  shoes
store branch                                   
1     B               0.0   0.0      0.0    0.0
      C               0.0   0.0     33.3    0.0
      A               0.0   0.0     33.3   33.3
2     B               0.0   0.0      0.0    0.0
      C              33.3   0.0     33.3   33.3
3     B               0.0   0.0      0.0    0.0
      C               0.0   0.0     25.0    0.0
      A               0.0  50.0     25.0    0.0
