Python 基于另一列的唯一列值计数

Python 基于另一列的唯一列值计数,python,pandas,Python,Pandas,我有一张桌子(下面的例子) 对于每个踢球者值- 我想计算一下有多少场上的进球被完成和错过(用%) 对于每个踢球者值- 我想找出在每一码范围内的射门次数,让我们用cut和crosstab out = pd.crosstab([df.kicker,pd.cut(df.kick_yards,[20,30,40,50,np.Inf],include_lowest=True)] ,df.kick_result,normalize='index') out Out[

我有一张桌子(下面的例子)

对于每个踢球者值-

我想计算一下有多少场上的进球被完成和错过(用%)

对于每个踢球者值-


我想找出在每一码范围内的射门次数,让我们用
cut
crosstab

out = pd.crosstab([df.kicker,pd.cut(df.kick_yards,[20,30,40,50,np.Inf],include_lowest=True)]
                   ,df.kick_result,normalize='index')
 
out
Out[228]: 
kick_result            MADE  MISS
kicker kick_yards                
X1     (19.999, 30.0]   1.0   0.0
X2     (19.999, 30.0]   1.0   0.0
       (30.0, 40.0]     1.0   0.0
X3     (40.0, 50.0]     0.0   1.0

利用
get_dummies
cut
并构建一个生成的
DataFrame

df['Att'] = 1

dfmm = pd.get_dummies(df['kick_result'])

cols_A = ['A20','A21-30','A31-40','A41-50','A51+']
cols_M = [x.replace('A','M') for x in cols_A]

df_att = pd.DataFrame(pd.get_dummies(pd.cut(df.kick_yards,[0,20,30,40,50,np.Inf],include_lowest=True)))
df_att.columns = df_att.columns.to_list()
df_att.columns = cols_A

df_made = df_att.multiply(dfmm['MADE'], axis=0)
df_made.columns=cols_M
    
dff = pd.concat([df,dfmm,df_att,df_made], axis=1).drop(['kick_result','kick_yards'], axis=1)
结果数据帧:

  kicker  Att  MADE  MISS  A20  A21-30  A31-40  A41-50  A51+  M20  M21-30  \
0     X1    1     1     0    0       1       0       0     0    0       1   
1     X2    1     1     0    0       1       0       0     0    0       1   
2     X2    1     1     0    0       1       0       0     0    0       1   
3     X2    1     1     0    0       0       1       0     0    0       0   
4     X3    1     0     1    0       0       0       1     0    0       0   

   M31-40  M41-50  M51+  
0       0       0     0  
1       0       0     0  
2       0       0     0  
3       1       0     0  
4       0       0     0  
dff.groupby('kicker').agg(['sum'])

       Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 M31-40 M41-50  \
       sum  sum  sum sum    sum    sum    sum  sum sum    sum    sum    sum   
kicker                                                                        
X1       1    1    0   0      1      0      0    0   0      1      0      0   
X2       3    3    0   0      2      1      0    0   0      2      1      0   
X3       1    0    1   0      0      0      1    0   0      0      0      0   

       M51+  
        sum  
kicker       
X1        0  
X2        0  
X3        0  
(df.groupby('kicker')['kick_result']
   .value_counts(normalize=True).mul(100).round(2)
   .sort_index()
   .to_frame(name='Result_%')
).reset_index()
  kicker kick_result  Result_%
0     X1        MADE     60.00
1     X1        MISS     40.00
2     X2        MADE     60.00
3     X2        MISS     40.00
4     X3        MADE     33.33
5     X3        MISS     66.67
                        kick_result  MADE  MISS  Total_Attempts
        kicker           kick_yards                              
            X1          (0.0, 20.0]     1     0               1
                       (20.0, 30.0]     1     0               1
                       (30.0, 40.0]     1     0               1
                       (40.0, 50.0]     0     1               1
                        (50.0, inf]     0     1               1
            X2         (20.0, 30.0]     2     0               2
                       (30.0, 40.0]     1     1               2
                       (40.0, 50.0]     0     1               1
            X3         (20.0, 30.0]     0     1               1
                       (40.0, 50.0]     0     1               1
                        (50.0, inf]     1     0               1
Total_Attempts                          7     6              13
来自该数据帧的聚合:

  kicker  Att  MADE  MISS  A20  A21-30  A31-40  A41-50  A51+  M20  M21-30  \
0     X1    1     1     0    0       1       0       0     0    0       1   
1     X2    1     1     0    0       1       0       0     0    0       1   
2     X2    1     1     0    0       1       0       0     0    0       1   
3     X2    1     1     0    0       0       1       0     0    0       0   
4     X3    1     0     1    0       0       0       1     0    0       0   

   M31-40  M41-50  M51+  
0       0       0     0  
1       0       0     0  
2       0       0     0  
3       1       0     0  
4       0       0     0  
dff.groupby('kicker').agg(['sum'])

       Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 M31-40 M41-50  \
       sum  sum  sum sum    sum    sum    sum  sum sum    sum    sum    sum   
kicker                                                                        
X1       1    1    0   0      1      0      0    0   0      1      0      0   
X2       3    3    0   0      2      1      0    0   0      2      1      0   
X3       1    0    1   0      0      0      1    0   0      0      0      0   

       M51+  
        sum  
kicker       
X1        0  
X2        0  
X3        0  
(df.groupby('kicker')['kick_result']
   .value_counts(normalize=True).mul(100).round(2)
   .sort_index()
   .to_frame(name='Result_%')
).reset_index()
  kicker kick_result  Result_%
0     X1        MADE     60.00
1     X1        MISS     40.00
2     X2        MADE     60.00
3     X2        MISS     40.00
4     X3        MADE     33.33
5     X3        MISS     66.67
                        kick_result  MADE  MISS  Total_Attempts
        kicker           kick_yards                              
            X1          (0.0, 20.0]     1     0               1
                       (20.0, 30.0]     1     0               1
                       (30.0, 40.0]     1     0               1
                       (40.0, 50.0]     0     1               1
                        (50.0, inf]     0     1               1
            X2         (20.0, 30.0]     2     0               2
                       (30.0, 40.0]     1     1               2
                       (40.0, 50.0]     0     1               1
            X3         (20.0, 30.0]     0     1               1
                       (40.0, 50.0]     0     1               1
                        (50.0, inf]     1     0               1
Total_Attempts                          7     6              13

由于您的要求包含两部分:

  • 实地目标%,以及
  • 在院子里射门
  • 让我们逐一解决它

    第1部分:实地目标% 我们可以与一起使用来获得:

    (df.groupby('kicker')['kick_result']
       .value_counts(normalize=True).mul(100).round(2)
       .sort_index()
       .to_frame(name='Result_%')
    ).reset_index()
    
    试运行 测试数据构造:

      kicker  Att  MADE  MISS  A20  A21-30  A31-40  A41-50  A51+  M20  M21-30  \
    0     X1    1     1     0    0       1       0       0     0    0       1   
    1     X2    1     1     0    0       1       0       0     0    0       1   
    2     X2    1     1     0    0       1       0       0     0    0       1   
    3     X2    1     1     0    0       0       1       0     0    0       0   
    4     X3    1     0     1    0       0       0       1     0    0       0   
    
       M31-40  M41-50  M51+  
    0       0       0     0  
    1       0       0     0  
    2       0       0     0  
    3       1       0     0  
    4       0       0     0  
    
    dff.groupby('kicker').agg(['sum'])
    
           Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 M31-40 M41-50  \
           sum  sum  sum sum    sum    sum    sum  sum sum    sum    sum    sum   
    kicker                                                                        
    X1       1    1    0   0      1      0      0    0   0      1      0      0   
    X2       3    3    0   0      2      1      0    0   0      2      1      0   
    X3       1    0    1   0      0      0      1    0   0      0      0      0   
    
           M51+  
            sum  
    kicker       
    X1        0  
    X2        0  
    X3        0  
    
    (df.groupby('kicker')['kick_result']
       .value_counts(normalize=True).mul(100).round(2)
       .sort_index()
       .to_frame(name='Result_%')
    ).reset_index()
    
      kicker kick_result  Result_%
    0     X1        MADE     60.00
    1     X1        MISS     40.00
    2     X2        MADE     60.00
    3     X2        MISS     40.00
    4     X3        MADE     33.33
    5     X3        MISS     66.67
    
                            kick_result  MADE  MISS  Total_Attempts
            kicker           kick_yards                              
                X1          (0.0, 20.0]     1     0               1
                           (20.0, 30.0]     1     0               1
                           (30.0, 40.0]     1     0               1
                           (40.0, 50.0]     0     1               1
                            (50.0, inf]     0     1               1
                X2         (20.0, 30.0]     2     0               2
                           (30.0, 40.0]     1     1               2
                           (40.0, 50.0]     0     1               1
                X3         (20.0, 30.0]     0     1               1
                           (40.0, 50.0]     0     1               1
                            (50.0, inf]     1     0               1
    Total_Attempts                          7     6              13
    
    为了对各种要求进行完整测试,我添加了测试数据,如下所示:

        kick_result kick_yards  kicker
    49     MADE       18.0       X1 
    50     MADE       28.0       X1
    51     MADE       38.0       X1
    52     MISS       48.0       X1
    53     MISS       58.0       X1
    64     MADE       30.0       X2
    75     MADE       27.0       X2
    158    MADE       32.0       X2
    159    MISS       32.0       X2
    160    MISS       42.0       X2
    259    MISS       46.0       X3
    260    MISS       26.0       X3
    261    MADE       56.0       X3
    
    运行代码:

      kicker  Att  MADE  MISS  A20  A21-30  A31-40  A41-50  A51+  M20  M21-30  \
    0     X1    1     1     0    0       1       0       0     0    0       1   
    1     X2    1     1     0    0       1       0       0     0    0       1   
    2     X2    1     1     0    0       1       0       0     0    0       1   
    3     X2    1     1     0    0       0       1       0     0    0       0   
    4     X3    1     0     1    0       0       0       1     0    0       0   
    
       M31-40  M41-50  M51+  
    0       0       0     0  
    1       0       0     0  
    2       0       0     0  
    3       1       0     0  
    4       0       0     0  
    
    dff.groupby('kicker').agg(['sum'])
    
           Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 M31-40 M41-50  \
           sum  sum  sum sum    sum    sum    sum  sum sum    sum    sum    sum   
    kicker                                                                        
    X1       1    1    0   0      1      0      0    0   0      1      0      0   
    X2       3    3    0   0      2      1      0    0   0      2      1      0   
    X3       1    0    1   0      0      0      1    0   0      0      0      0   
    
           M51+  
            sum  
    kicker       
    X1        0  
    X2        0  
    X3        0  
    
    (df.groupby('kicker')['kick_result']
       .value_counts(normalize=True).mul(100).round(2)
       .sort_index()
       .to_frame(name='Result_%')
    ).reset_index()
    
      kicker kick_result  Result_%
    0     X1        MADE     60.00
    1     X1        MISS     40.00
    2     X2        MADE     60.00
    3     X2        MISS     40.00
    4     X3        MADE     33.33
    5     X3        MISS     66.67
    
                            kick_result  MADE  MISS  Total_Attempts
            kicker           kick_yards                              
                X1          (0.0, 20.0]     1     0               1
                           (20.0, 30.0]     1     0               1
                           (30.0, 40.0]     1     0               1
                           (40.0, 50.0]     0     1               1
                            (50.0, inf]     0     1               1
                X2         (20.0, 30.0]     2     0               2
                           (30.0, 40.0]     1     1               2
                           (40.0, 50.0]     0     1               1
                X3         (20.0, 30.0]     0     1               1
                           (40.0, 50.0]     0     1               1
                            (50.0, inf]     1     0               1
    Total_Attempts                          7     6              13
    
    结果:

      kicker  Att  MADE  MISS  A20  A21-30  A31-40  A41-50  A51+  M20  M21-30  \
    0     X1    1     1     0    0       1       0       0     0    0       1   
    1     X2    1     1     0    0       1       0       0     0    0       1   
    2     X2    1     1     0    0       1       0       0     0    0       1   
    3     X2    1     1     0    0       0       1       0     0    0       0   
    4     X3    1     0     1    0       0       0       1     0    0       0   
    
       M31-40  M41-50  M51+  
    0       0       0     0  
    1       0       0     0  
    2       0       0     0  
    3       1       0     0  
    4       0       0     0  
    
    dff.groupby('kicker').agg(['sum'])
    
           Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 M31-40 M41-50  \
           sum  sum  sum sum    sum    sum    sum  sum sum    sum    sum    sum   
    kicker                                                                        
    X1       1    1    0   0      1      0      0    0   0      1      0      0   
    X2       3    3    0   0      2      1      0    0   0      2      1      0   
    X3       1    0    1   0      0      0      1    0   0      0      0      0   
    
           M51+  
            sum  
    kicker       
    X1        0  
    X2        0  
    X3        0  
    
    (df.groupby('kicker')['kick_result']
       .value_counts(normalize=True).mul(100).round(2)
       .sort_index()
       .to_frame(name='Result_%')
    ).reset_index()
    
      kicker kick_result  Result_%
    0     X1        MADE     60.00
    1     X1        MISS     40.00
    2     X2        MADE     60.00
    3     X2        MISS     40.00
    4     X3        MADE     33.33
    5     X3        MISS     66.67
    
                            kick_result  MADE  MISS  Total_Attempts
            kicker           kick_yards                              
                X1          (0.0, 20.0]     1     0               1
                           (20.0, 30.0]     1     0               1
                           (30.0, 40.0]     1     0               1
                           (40.0, 50.0]     0     1               1
                            (50.0, inf]     0     1               1
                X2         (20.0, 30.0]     2     0               2
                           (30.0, 40.0]     1     1               2
                           (40.0, 50.0]     0     1               1
                X3         (20.0, 30.0]     0     1               1
                           (40.0, 50.0]     0     1               1
                            (50.0, inf]     1     0               1
    Total_Attempts                          7     6              13
    
    第2部分:场地范围内的场地目标 我们可以使用和来建立一个具有码范围的表

    还包括所有范围的总尝试次数。

    pd.crosstab(index=[df['kicker'], pd.cut(df['kick_yards'],[0, 20, 30, 40, 50, np.inf])], 
                columns=df['kick_result'], 
                margins=True, margins_name='Total_Attempts')
    
    结果(使用丰富的测试数据):

      kicker  Att  MADE  MISS  A20  A21-30  A31-40  A41-50  A51+  M20  M21-30  \
    0     X1    1     1     0    0       1       0       0     0    0       1   
    1     X2    1     1     0    0       1       0       0     0    0       1   
    2     X2    1     1     0    0       1       0       0     0    0       1   
    3     X2    1     1     0    0       0       1       0     0    0       0   
    4     X3    1     0     1    0       0       0       1     0    0       0   
    
       M31-40  M41-50  M51+  
    0       0       0     0  
    1       0       0     0  
    2       0       0     0  
    3       1       0     0  
    4       0       0     0  
    
    dff.groupby('kicker').agg(['sum'])
    
           Att MADE MISS A20 A21-30 A31-40 A41-50 A51+ M20 M21-30 M31-40 M41-50  \
           sum  sum  sum sum    sum    sum    sum  sum sum    sum    sum    sum   
    kicker                                                                        
    X1       1    1    0   0      1      0      0    0   0      1      0      0   
    X2       3    3    0   0      2      1      0    0   0      2      1      0   
    X3       1    0    1   0      0      0      1    0   0      0      0      0   
    
           M51+  
            sum  
    kicker       
    X1        0  
    X2        0  
    X3        0  
    
    (df.groupby('kicker')['kick_result']
       .value_counts(normalize=True).mul(100).round(2)
       .sort_index()
       .to_frame(name='Result_%')
    ).reset_index()
    
      kicker kick_result  Result_%
    0     X1        MADE     60.00
    1     X1        MISS     40.00
    2     X2        MADE     60.00
    3     X2        MISS     40.00
    4     X3        MADE     33.33
    5     X3        MISS     66.67
    
                            kick_result  MADE  MISS  Total_Attempts
            kicker           kick_yards                              
                X1          (0.0, 20.0]     1     0               1
                           (20.0, 30.0]     1     0               1
                           (30.0, 40.0]     1     0               1
                           (40.0, 50.0]     0     1               1
                            (50.0, inf]     0     1               1
                X2         (20.0, 30.0]     2     0               2
                           (30.0, 40.0]     1     1               2
                           (40.0, 50.0]     0     1               1
                X3         (20.0, 30.0]     0     1               1
                           (40.0, 50.0]     0     1               1
                            (50.0, inf]     1     0               1
    Total_Attempts                          7     6              13
    

    这是可行的,但我需要为kicker的每个值使用它-所以这只会给我总数。@Gamecocks20检查更新~是否还有方法将总尝试次数添加到此帧?@Gamecocks20检查crosstab@BENY中的边距X2不应该有3而不是2?