Python Pandas Groupby-计算每组值的百分比总值_Python_Pandas_Pandas Groupby

Python Pandas Groupby-计算每组值的百分比总值

python pandas

Python Pandas Groupby-计算每组值的百分比总值,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有以下声明： outcome teams points loss arsenal 0.9375 chelsea 0.5000 manu 0.2000 win arsenal 0.0625 chelsea 0.5000 manu 0.8000 df['teams'].groupby（train_sub['output']）。value_counts（）返回如下内容： out

我有以下声明：

outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000

df['teams'].groupby（train_sub['output']）。value_counts（）

返回如下内容：

outcome | teams 
--------|----------------|-----
  win   | Man utd        | 120
        | Chelsea        | 75
        | Arsenal        | 10
--------|----------------|------
  loss  | Man utd        | 30
        | Chelsea        | 75
        | Arsenal        | 150

outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000

对于每个团队，我想显示每个结果占团队总数的百分比（而不是数据框中的总条目）。比如说：

outcome | teams 
--------|----------------|-----
  win   | Man utd        | 0.80
        | Chelsea        | 0.5
        | Arsenal        | 0.0625
--------|----------------|------
  loss  | Man utd        | 0.20
        | Chelsea        | 0.5
        | Arsenal        | 0.9375

outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000

请告诉我如何获得此结果？

像您这样复制数据集：

df = pd.DataFrame()
df['outcome'] = ['win', 'win', 'win', 'loss', 'loss', 'loss']
df['teams'] = ['manu', 'chelsea', 'arsenal', 'manu', 'chelsea', 'arsenal']
df['points'] = [120, 75, 10, 30, 75, 150]
grouped = df.groupby(['outcome', 'teams'])['points'].sum()

outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000

我的

分组的变量现在看起来与您的类似
                 points
outcome teams          
loss    arsenal     150
        chelsea      75
        manu         30
win     arsenal      10
        chelsea      75
        manu        120

outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000


outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000

解决方案：
grouped
在您的案例中是df['teams']的结果。groupby（train_sub['output']）。value_counts（）
。所以，只要做：
grouped / grouped.groupby(level = 1).sum()

outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000

输出：
outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000

复制数据集，就像您有：
df = pd.DataFrame()
df['outcome'] = ['win', 'win', 'win', 'loss', 'loss', 'loss']
df['teams'] = ['manu', 'chelsea', 'arsenal', 'manu', 'chelsea', 'arsenal']
df['points'] = [120, 75, 10, 30, 75, 150]
grouped = df.groupby(['outcome', 'teams'])['points'].sum()

outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000

我的分组的变量现在看起来与您的类似
                 points
outcome teams          
loss    arsenal     150
        chelsea      75
        manu         30
win     arsenal      10
        chelsea      75
        manu        120

outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000


outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000

解决方案：
grouped
在您的案例中是df['teams']的结果。groupby（train_sub['output']）。value_counts（）
。所以，只要做：
grouped / grouped.groupby(level = 1).sum()

outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000

输出：
outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000

太棒了，谢谢@ankur如果我希望整个输出保持原样，但每个类别只有分数列按降序排序，那该怎么办呢？是的，那么我如何才能看到赢家和输家按每个类别的最高百分比排序呢？你首先可以做的是：x=（grouped/grouped.groupby（level=1.sum（））.reset\u index（）
然后执行：x.sort_值（['output'，points']，升序=[False，False]）
Awesome，谢谢@ankur如果我希望整个输出保持原样，但只对每个类别的points列按降序排序，该怎么办，那么，我如何才能看到赢家和输家都是按照你首先能做的最高百分比排序的：x=（分组/分组.分组比（level=1）.sum（））.reset_index（）
，然后做：x.sort_值（['outcome'，'points'，升序=[False，False]）