Python 基于另一列的唯一列值计数
我有一张桌子(下面的例子) 对于每个踢球者值- 我想计算一下有多少场上的进球被完成和错过(用%) 对于每个踢球者值-Python 基于另一列的唯一列值计数,python,pandas,Python,Pandas,我有一张桌子(下面的例子) 对于每个踢球者值- 我想计算一下有多少场上的进球被完成和错过(用%) 对于每个踢球者值- 我想找出在每一码范围内的射门次数,让我们用cut和crosstab out = pd.crosstab([df.kicker,pd.cut(df.kick_yards,[20,30,40,50,np.Inf],include_lowest=True)] ,df.kick_result,normalize='index') out Out[
我想找出在每一码范围内的射门次数,让我们用
cut
和crosstab
out = pd.crosstab([df.kicker,pd.cut(df.kick_yards,[20,30,40,50,np.Inf],include_lowest=True)]
,df.kick_result,normalize='index')
out
Out[228]:
kick_result MADE MISS
kicker kick_yards
X1 (19.999, 30.0] 1.0 0.0
X2 (19.999, 30.0] 1.0 0.0
(30.0, 40.0] 1.0 0.0
X3 (40.0, 50.0] 0.0 1.0
利用
get_dummies
,cut
并构建一个生成的DataFrame
:
df['Att'] = 1
dfmm = pd.get_dummies(df['kick_result'])
cols_A = ['A20','A21-30','A31-40','A41-50','A51+']
cols_M = [x.replace('A','M') for x in cols_A]
df_att = pd.DataFrame(pd.get_dummies(pd.cut(df.kick_yards,[0,20,30,40,50,np.Inf],include_lowest=True)))
df_att.columns = df_att.columns.to_list()
df_att.columns = cols_A
df_made = df_att.multiply(dfmm['MADE'], axis=0)
df_made.columns=cols_M
dff = pd.concat([df,dfmm,df_att,df_made], axis=1).drop(['kick_result','kick_yards'], axis=1)
结果数据帧:
kicker Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 \
0 X1 1 1 0 0 1 0 0 0 0 1
1 X2 1 1 0 0 1 0 0 0 0 1
2 X2 1 1 0 0 1 0 0 0 0 1
3 X2 1 1 0 0 0 1 0 0 0 0
4 X3 1 0 1 0 0 0 1 0 0 0
M31-40 M41-50 M51+
0 0 0 0
1 0 0 0
2 0 0 0
3 1 0 0
4 0 0 0
dff.groupby('kicker').agg(['sum'])
Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 M31-40 M41-50 \
sum sum sum sum sum sum sum sum sum sum sum sum
kicker
X1 1 1 0 0 1 0 0 0 0 1 0 0
X2 3 3 0 0 2 1 0 0 0 2 1 0
X3 1 0 1 0 0 0 1 0 0 0 0 0
M51+
sum
kicker
X1 0
X2 0
X3 0
(df.groupby('kicker')['kick_result']
.value_counts(normalize=True).mul(100).round(2)
.sort_index()
.to_frame(name='Result_%')
).reset_index()
kicker kick_result Result_%
0 X1 MADE 60.00
1 X1 MISS 40.00
2 X2 MADE 60.00
3 X2 MISS 40.00
4 X3 MADE 33.33
5 X3 MISS 66.67
kick_result MADE MISS Total_Attempts
kicker kick_yards
X1 (0.0, 20.0] 1 0 1
(20.0, 30.0] 1 0 1
(30.0, 40.0] 1 0 1
(40.0, 50.0] 0 1 1
(50.0, inf] 0 1 1
X2 (20.0, 30.0] 2 0 2
(30.0, 40.0] 1 1 2
(40.0, 50.0] 0 1 1
X3 (20.0, 30.0] 0 1 1
(40.0, 50.0] 0 1 1
(50.0, inf] 1 0 1
Total_Attempts 7 6 13
来自该数据帧的聚合:
kicker Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 \
0 X1 1 1 0 0 1 0 0 0 0 1
1 X2 1 1 0 0 1 0 0 0 0 1
2 X2 1 1 0 0 1 0 0 0 0 1
3 X2 1 1 0 0 0 1 0 0 0 0
4 X3 1 0 1 0 0 0 1 0 0 0
M31-40 M41-50 M51+
0 0 0 0
1 0 0 0
2 0 0 0
3 1 0 0
4 0 0 0
dff.groupby('kicker').agg(['sum'])
Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 M31-40 M41-50 \
sum sum sum sum sum sum sum sum sum sum sum sum
kicker
X1 1 1 0 0 1 0 0 0 0 1 0 0
X2 3 3 0 0 2 1 0 0 0 2 1 0
X3 1 0 1 0 0 0 1 0 0 0 0 0
M51+
sum
kicker
X1 0
X2 0
X3 0
(df.groupby('kicker')['kick_result']
.value_counts(normalize=True).mul(100).round(2)
.sort_index()
.to_frame(name='Result_%')
).reset_index()
kicker kick_result Result_%
0 X1 MADE 60.00
1 X1 MISS 40.00
2 X2 MADE 60.00
3 X2 MISS 40.00
4 X3 MADE 33.33
5 X3 MISS 66.67
kick_result MADE MISS Total_Attempts
kicker kick_yards
X1 (0.0, 20.0] 1 0 1
(20.0, 30.0] 1 0 1
(30.0, 40.0] 1 0 1
(40.0, 50.0] 0 1 1
(50.0, inf] 0 1 1
X2 (20.0, 30.0] 2 0 2
(30.0, 40.0] 1 1 2
(40.0, 50.0] 0 1 1
X3 (20.0, 30.0] 0 1 1
(40.0, 50.0] 0 1 1
(50.0, inf] 1 0 1
Total_Attempts 7 6 13
由于您的要求包含两部分:
(df.groupby('kicker')['kick_result']
.value_counts(normalize=True).mul(100).round(2)
.sort_index()
.to_frame(name='Result_%')
).reset_index()
试运行
测试数据构造:
kicker Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 \
0 X1 1 1 0 0 1 0 0 0 0 1
1 X2 1 1 0 0 1 0 0 0 0 1
2 X2 1 1 0 0 1 0 0 0 0 1
3 X2 1 1 0 0 0 1 0 0 0 0
4 X3 1 0 1 0 0 0 1 0 0 0
M31-40 M41-50 M51+
0 0 0 0
1 0 0 0
2 0 0 0
3 1 0 0
4 0 0 0
dff.groupby('kicker').agg(['sum'])
Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 M31-40 M41-50 \
sum sum sum sum sum sum sum sum sum sum sum sum
kicker
X1 1 1 0 0 1 0 0 0 0 1 0 0
X2 3 3 0 0 2 1 0 0 0 2 1 0
X3 1 0 1 0 0 0 1 0 0 0 0 0
M51+
sum
kicker
X1 0
X2 0
X3 0
(df.groupby('kicker')['kick_result']
.value_counts(normalize=True).mul(100).round(2)
.sort_index()
.to_frame(name='Result_%')
).reset_index()
kicker kick_result Result_%
0 X1 MADE 60.00
1 X1 MISS 40.00
2 X2 MADE 60.00
3 X2 MISS 40.00
4 X3 MADE 33.33
5 X3 MISS 66.67
kick_result MADE MISS Total_Attempts
kicker kick_yards
X1 (0.0, 20.0] 1 0 1
(20.0, 30.0] 1 0 1
(30.0, 40.0] 1 0 1
(40.0, 50.0] 0 1 1
(50.0, inf] 0 1 1
X2 (20.0, 30.0] 2 0 2
(30.0, 40.0] 1 1 2
(40.0, 50.0] 0 1 1
X3 (20.0, 30.0] 0 1 1
(40.0, 50.0] 0 1 1
(50.0, inf] 1 0 1
Total_Attempts 7 6 13
为了对各种要求进行完整测试,我添加了测试数据,如下所示:
kick_result kick_yards kicker
49 MADE 18.0 X1
50 MADE 28.0 X1
51 MADE 38.0 X1
52 MISS 48.0 X1
53 MISS 58.0 X1
64 MADE 30.0 X2
75 MADE 27.0 X2
158 MADE 32.0 X2
159 MISS 32.0 X2
160 MISS 42.0 X2
259 MISS 46.0 X3
260 MISS 26.0 X3
261 MADE 56.0 X3
运行代码:
kicker Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 \
0 X1 1 1 0 0 1 0 0 0 0 1
1 X2 1 1 0 0 1 0 0 0 0 1
2 X2 1 1 0 0 1 0 0 0 0 1
3 X2 1 1 0 0 0 1 0 0 0 0
4 X3 1 0 1 0 0 0 1 0 0 0
M31-40 M41-50 M51+
0 0 0 0
1 0 0 0
2 0 0 0
3 1 0 0
4 0 0 0
dff.groupby('kicker').agg(['sum'])
Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 M31-40 M41-50 \
sum sum sum sum sum sum sum sum sum sum sum sum
kicker
X1 1 1 0 0 1 0 0 0 0 1 0 0
X2 3 3 0 0 2 1 0 0 0 2 1 0
X3 1 0 1 0 0 0 1 0 0 0 0 0
M51+
sum
kicker
X1 0
X2 0
X3 0
(df.groupby('kicker')['kick_result']
.value_counts(normalize=True).mul(100).round(2)
.sort_index()
.to_frame(name='Result_%')
).reset_index()
kicker kick_result Result_%
0 X1 MADE 60.00
1 X1 MISS 40.00
2 X2 MADE 60.00
3 X2 MISS 40.00
4 X3 MADE 33.33
5 X3 MISS 66.67
kick_result MADE MISS Total_Attempts
kicker kick_yards
X1 (0.0, 20.0] 1 0 1
(20.0, 30.0] 1 0 1
(30.0, 40.0] 1 0 1
(40.0, 50.0] 0 1 1
(50.0, inf] 0 1 1
X2 (20.0, 30.0] 2 0 2
(30.0, 40.0] 1 1 2
(40.0, 50.0] 0 1 1
X3 (20.0, 30.0] 0 1 1
(40.0, 50.0] 0 1 1
(50.0, inf] 1 0 1
Total_Attempts 7 6 13
结果:
kicker Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 \
0 X1 1 1 0 0 1 0 0 0 0 1
1 X2 1 1 0 0 1 0 0 0 0 1
2 X2 1 1 0 0 1 0 0 0 0 1
3 X2 1 1 0 0 0 1 0 0 0 0
4 X3 1 0 1 0 0 0 1 0 0 0
M31-40 M41-50 M51+
0 0 0 0
1 0 0 0
2 0 0 0
3 1 0 0
4 0 0 0
dff.groupby('kicker').agg(['sum'])
Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 M31-40 M41-50 \
sum sum sum sum sum sum sum sum sum sum sum sum
kicker
X1 1 1 0 0 1 0 0 0 0 1 0 0
X2 3 3 0 0 2 1 0 0 0 2 1 0
X3 1 0 1 0 0 0 1 0 0 0 0 0
M51+
sum
kicker
X1 0
X2 0
X3 0
(df.groupby('kicker')['kick_result']
.value_counts(normalize=True).mul(100).round(2)
.sort_index()
.to_frame(name='Result_%')
).reset_index()
kicker kick_result Result_%
0 X1 MADE 60.00
1 X1 MISS 40.00
2 X2 MADE 60.00
3 X2 MISS 40.00
4 X3 MADE 33.33
5 X3 MISS 66.67
kick_result MADE MISS Total_Attempts
kicker kick_yards
X1 (0.0, 20.0] 1 0 1
(20.0, 30.0] 1 0 1
(30.0, 40.0] 1 0 1
(40.0, 50.0] 0 1 1
(50.0, inf] 0 1 1
X2 (20.0, 30.0] 2 0 2
(30.0, 40.0] 1 1 2
(40.0, 50.0] 0 1 1
X3 (20.0, 30.0] 0 1 1
(40.0, 50.0] 0 1 1
(50.0, inf] 1 0 1
Total_Attempts 7 6 13
第2部分:场地范围内的场地目标
我们可以使用和来建立一个具有码范围的表
还包括所有范围的总尝试次数。
pd.crosstab(index=[df['kicker'], pd.cut(df['kick_yards'],[0, 20, 30, 40, 50, np.inf])],
columns=df['kick_result'],
margins=True, margins_name='Total_Attempts')
结果(使用丰富的测试数据):
kicker Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 \
0 X1 1 1 0 0 1 0 0 0 0 1
1 X2 1 1 0 0 1 0 0 0 0 1
2 X2 1 1 0 0 1 0 0 0 0 1
3 X2 1 1 0 0 0 1 0 0 0 0
4 X3 1 0 1 0 0 0 1 0 0 0
M31-40 M41-50 M51+
0 0 0 0
1 0 0 0
2 0 0 0
3 1 0 0
4 0 0 0
dff.groupby('kicker').agg(['sum'])
Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 M31-40 M41-50 \
sum sum sum sum sum sum sum sum sum sum sum sum
kicker
X1 1 1 0 0 1 0 0 0 0 1 0 0
X2 3 3 0 0 2 1 0 0 0 2 1 0
X3 1 0 1 0 0 0 1 0 0 0 0 0
M51+
sum
kicker
X1 0
X2 0
X3 0
(df.groupby('kicker')['kick_result']
.value_counts(normalize=True).mul(100).round(2)
.sort_index()
.to_frame(name='Result_%')
).reset_index()
kicker kick_result Result_%
0 X1 MADE 60.00
1 X1 MISS 40.00
2 X2 MADE 60.00
3 X2 MISS 40.00
4 X3 MADE 33.33
5 X3 MISS 66.67
kick_result MADE MISS Total_Attempts
kicker kick_yards
X1 (0.0, 20.0] 1 0 1
(20.0, 30.0] 1 0 1
(30.0, 40.0] 1 0 1
(40.0, 50.0] 0 1 1
(50.0, inf] 0 1 1
X2 (20.0, 30.0] 2 0 2
(30.0, 40.0] 1 1 2
(40.0, 50.0] 0 1 1
X3 (20.0, 30.0] 0 1 1
(40.0, 50.0] 0 1 1
(50.0, inf] 1 0 1
Total_Attempts 7 6 13
这是可行的,但我需要为kicker的每个值使用它-所以这只会给我总数。@Gamecocks20检查更新~是否还有方法将总尝试次数添加到此帧?@Gamecocks20检查crosstab@BENY中的边距X2不应该有3而不是2?