Python 熊猫在多列上分组，并将结果广播到原始数据帧_Python_Pandas_Pandas Groupby

Python 熊猫在多列上分组，并将结果广播到原始数据帧

python pandas

Python 熊猫在多列上分组，并将结果广播到原始数据帧,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有一个如下形式的数据框： bowler inning wickets Total_wickets matches balls 0 SL Malinga 1 69 143 44 4078 1 SL Malinga 2 74 143 54 4735 2 A Mishra 1 48 124 50

我有一个如下形式的数据框：

      bowler    inning  wickets Total_wickets   matches balls
0   SL Malinga     1      69         143          44    4078
1   SL Malinga     2      74         143          54    4735
2   A Mishra       1      48         124          50    3908
3   A Mishra       2      76         124          62    4930
4   DJ Bravo       1      61         122          48    3887

我想将此df分组为“投球手”和“投球”，并对“边线”和“球”列执行一些计算，然后将其广播到与新列相同的df。我尝试的方法之一是使用转换，例如：

df_bowler['strike rate'] = df_bowler.groupby(['bowler','inning']).transform(lambda x : x['balls']/x['wickets'])

这将导致keyError异常：

KeyError:（'balls'，'发生在索引wickets'）

我通过使用apply和merge完成了我需要的功能，例如：

df_strRate = df_bowler.groupby(['bowler','inning']).apply(lambda x:x['balls']/x['wickets']).reset_index(level=2,drop=True).reset_index(name='strike rate')
df_bowler = df_bowler.merge(df_strRate,on=['bowler','inning'])

然而，这似乎是一种迂回的做法。我想知道为什么在这种情况下转换失败。有什么建议吗

谢谢。

您的转换失败，因为您沿错误的轴应用了它，并且需要先使用聚合，例如

sum（）

。看看这个：

In [83]: df.groupby(['bowler', 'inning']).sum().transform(lambda x : x['balls'].astype(float)/x['wickets'].astype(float), axis=1)
Out[83]: 
bowler      inning
A Mishra    1         81.416667
            2         64.868421
DJ Bravo    1         63.721311
SL Malinga  1         59.101449
            2         63.986486
dtype: float64

但你也可以这样做：

In [88]: df['strike_rate'] = df.balls / df.wickets
In [89]: df
Out[89]: 
       bowler  inning  wickets  Total_wickets  matches  balls  strike_rate
0  SL Malinga       1       69            143       44   4078    59.101449
1  SL Malinga       2       74            143       54   4735    63.986486
2    A Mishra       1       48            124       50   3908    81.416667
3    A Mishra       2       76            124       62   4930    64.868421
4    DJ Bravo       1       61            122       48   3887    63.721311

编辑：使用apply（）尝试以下方法

或者，如果您想要简单的列操作，我更喜欢Cory的第二个选项。

如果未在

[]

中定义列，则函数有问题。首先分别处理每个

系列

，因此无法同时处理两列，无法将它们分开：

def f(x):
    print (x)

2    48
Name: wickets, dtype: int64
2    124
Name: Total_wickets, dtype: int64
2    50
Name: matches, dtype: int64
2    3908
Name: balls, dtype: int64

df = df_bowler.groupby(['bowler','inning']).transform(f)

如果在

[]

中定义列：

def f(x):
    print (x)

2    3908
Name: (A Mishra, 1), dtype: int64
3    4930
Name: (A Mishra, 2), dtype: int64
4    3887
Name: (DJ Bravo, 1), dtype: int64
0    4078
Name: (SL Malinga, 1), dtype: int64
1    4735
Name: (SL Malinga, 2), dtype: int64


df = df_bowler.groupby(['bowler','inning'])['balls'].transform(f)

同样的工作功能

结论:

如果希望按组处理数据，则需要：

我认为你错了-你的转换不是，但是，因为在聚合

sum

get output之后，得到了另一个数据帧。我明白为什么我的初始转换失败了，谢谢。但是，我无法使您的转换解决方案工作。我在转换中得到了一个attributeError。我必须看到它的回溯。我只是再试了一次，我不需要做任何特别的事情，它仍然有效。检查您的

df.keys（）

此摘要非常有用。谢谢

def f(x):
    print (x)

2    3908
Name: (A Mishra, 1), dtype: int64
3    4930
Name: (A Mishra, 2), dtype: int64
4    3887
Name: (DJ Bravo, 1), dtype: int64
0    4078
Name: (SL Malinga, 1), dtype: int64
1    4735
Name: (SL Malinga, 2), dtype: int64


df = df_bowler.groupby(['bowler','inning'])['balls'].transform(f)

def f(x):
    print (x)

     bowler  inning  wickets  Total_wickets  matches  balls
2  A Mishra       1       48            124       50   3908
     bowler  inning  wickets  Total_wickets  matches  balls
2  A Mishra       1       48            124       50   3908
     bowler  inning  wickets  Total_wickets  matches  balls
3  A Mishra       2       76            124       62   4930
     bowler  inning  wickets  Total_wickets  matches  balls


df = df_bowler.groupby(['bowler','inning']).apply(f)