Python 如何根据每个组的长度和另一列的计数值计算bygroup结果_Python_Pandas

Python 如何根据每个组的长度和另一列的计数值计算bygroup结果

python pandas

Python 如何根据每个组的长度和另一列的计数值计算bygroup结果,python,pandas,Python,Pandas,我想用熊猫中的bygroup计算每个区域的得分率，但不确定如何计算：假设df有两列，如下所示： Shot_type Shot_zone Goal Penalty_area Saved Penalty_area Goal Goal Box Saved Goal Box 在这里，我想按射击区域分组，并根据每种射击区域的射击类型的进球数/len（）计算得分率。在这里，每个射门区域有一个目标和一个保存，因此结果应该如下所示： Penalty_a

我想用熊猫中的bygroup计算每个区域的得分率，但不确定如何计算：

假设df有两列，如下所示：

Shot_type   Shot_zone
   Goal     Penalty_area
   Saved    Penalty_area
   Goal     Goal Box
   Saved    Goal Box

在这里，我想按射击区域分组，并根据每种射击区域的射击类型的进球数/len（）计算得分率。在这里，每个射门区域有一个目标和一个保存，因此结果应该如下所示：

Penalty_area   50%
Goal Box       50%

有没有什么可以理解的方法来使用熊猫？

多谢各位

一种方法是对

Shot\u type

列进行二值化，即如果它等于

'Goal'

，则设置为

True

，然后使用

GroupBy

mean

：

res = df.assign(Shot_type=df['Shot_type']=='Goal')\
        .groupby('Shot_zone')['Shot_type'].mean()

print(res)

Shot_zone
GoalBox         0.5
Penalty_area    0.5
Name: Shot_type, dtype: float64

使用

也可以

groupby

和

apply

df.groupby('Shot_zone').Shot_type.apply(lambda s: '{}%'.format((s[s=='Goal']).size/(s.size) * 100))

Shot_zone
Goal_Box        50.0%
Penalty_area    50.0%

您可以使用以下方法执行相同的操作：

data[data['Shot_type']=='Goal'].groupby(['Shot_zone'])['Shot_zone'].count()
/data.groupby(['Shot_zone'])['Shot_zone'].count())

谢谢温，这比我想要的更多@commentallez您可以始终使用.loc来选择您需要的索引。我知道iloc和loc，但不知道如何处理bygroup对象……我只是通过使用它来学习，但对Pandastank you jpp来说是全新的，回答很好，尽管我实际上有4种不同的快照类型haha@commentallez-你啊,，然后使用交叉表（crosstab）实际上你的答案更准确…我不知道为什么交叉表不能给我正确的结果，可能是累积的？

data[data['Shot_type']=='Goal'].groupby(['Shot_zone'])['Shot_zone'].count()
/data.groupby(['Shot_zone'])['Shot_zone'].count())