Python Pandas Groupby-计算每组值的百分比总值
我有以下声明:Python Pandas Groupby-计算每组值的百分比总值,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有以下声明: outcome teams points loss arsenal 0.9375 chelsea 0.5000 manu 0.2000 win arsenal 0.0625 chelsea 0.5000 manu 0.8000 df['teams'].groupby(train_sub['output'])。value_counts() 返回如下内容: out
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
df['teams'].groupby(train_sub['output'])。value_counts()
返回如下内容:
outcome | teams
--------|----------------|-----
win | Man utd | 120
| Chelsea | 75
| Arsenal | 10
--------|----------------|------
loss | Man utd | 30
| Chelsea | 75
| Arsenal | 150
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
对于每个团队,我想显示每个结果占团队总数的百分比(而不是数据框中的总条目)。比如说:
outcome | teams
--------|----------------|-----
win | Man utd | 0.80
| Chelsea | 0.5
| Arsenal | 0.0625
--------|----------------|------
loss | Man utd | 0.20
| Chelsea | 0.5
| Arsenal | 0.9375
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
请告诉我如何获得此结果?像您这样复制数据集:
df = pd.DataFrame()
df['outcome'] = ['win', 'win', 'win', 'loss', 'loss', 'loss']
df['teams'] = ['manu', 'chelsea', 'arsenal', 'manu', 'chelsea', 'arsenal']
df['points'] = [120, 75, 10, 30, 75, 150]
grouped = df.groupby(['outcome', 'teams'])['points'].sum()
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
我的分组的变量现在看起来与您的类似
points
outcome teams
loss arsenal 150
chelsea 75
manu 30
win arsenal 10
chelsea 75
manu 120
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
解决方案:
grouped
在您的案例中是df['teams']的结果。groupby(train_sub['output'])。value_counts()
。所以,只要做:
grouped / grouped.groupby(level = 1).sum()
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
输出:
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
复制数据集,就像您有:
df = pd.DataFrame()
df['outcome'] = ['win', 'win', 'win', 'loss', 'loss', 'loss']
df['teams'] = ['manu', 'chelsea', 'arsenal', 'manu', 'chelsea', 'arsenal']
df['points'] = [120, 75, 10, 30, 75, 150]
grouped = df.groupby(['outcome', 'teams'])['points'].sum()
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
我的分组的变量现在看起来与您的类似
points
outcome teams
loss arsenal 150
chelsea 75
manu 30
win arsenal 10
chelsea 75
manu 120
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
解决方案:
grouped
在您的案例中是df['teams']的结果。groupby(train_sub['output'])。value_counts()
。所以,只要做:
grouped / grouped.groupby(level = 1).sum()
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
输出:
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
太棒了,谢谢@ankur如果我希望整个输出保持原样,但每个类别只有分数列按降序排序,那该怎么办呢?是的,那么我如何才能看到赢家和输家按每个类别的最高百分比排序呢?你首先可以做的是:x=(grouped/grouped.groupby(level=1.sum()).reset\u index()
然后执行:x.sort_值(['output',points'],升序=[False,False])
Awesome,谢谢@ankur如果我希望整个输出保持原样,但只对每个类别的points列按降序排序,该怎么办,那么,我如何才能看到赢家和输家都是按照你首先能做的最高百分比排序的:x=(分组/分组.分组比(level=1).sum()).reset_index()
,然后做:x.sort_值(['outcome','points',升序=[False,False])