Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/346.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 数据帧-如何计算条件滚动和?_Python_Dataframe - Fatal编程技术网

Python 数据帧-如何计算条件滚动和?

Python 数据帧-如何计算条件滚动和?,python,dataframe,Python,Dataframe,我有一个包含足球数据的数据框,每一行代表一场比赛。数据框包括以下列:“日期”、“HomeTeam”、“AwayTeam”、“Points\u HomeTeam”、“Points\u AwayTeam” +--------------------------------------------------------------------------+ | 'Date' 'HomeTeam' 'AwayTeam' 'Points_HomeTeam' 'Points_AwayTeam'

我有一个包含足球数据的数据框,每一行代表一场比赛。数据框包括以下列:“日期”、“HomeTeam”、“AwayTeam”、“Points\u HomeTeam”、“Points\u AwayTeam”

+--------------------------------------------------------------------------+
| 'Date'    'HomeTeam'   'AwayTeam'  'Points_HomeTeam' 'Points_AwayTeam'   |
+--------------------------------------------------------------------------+
| 2000-08-19 Charlton     Man City          0                 3            |
| 2000-08-19 Chelsea      Arsenal           1                 1            |
| 2000-08-23 Coventry     Man City          3                 0            |
| 2000-08-25 Man City     Liverpool         1                 1            |
| 2000-08-28 Derby        Man City          1                 1            |
| 2000-08-31 Leeds        Chelsea           3                 0            |
| 2000-08-31 Man City     Everton           3                 0            |
+--------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------+
| 'Date'    'HomeTeam'   'AwayTeam'  'Points_HomeTeam' 'Points_AwayTeam' 'New Column' |
+-------------------------------------------------------------------------------------+
| 2000-08-19 Charlton     Man City          0                 3          NA           |
| 2000-08-19 Chelsea      Arsenal           1                 1          NA           |
| 2000-08-23 Coventry     Man City          3                 0          NA           |
| 2000-08-25 Man City     Liverpool         1                 1          3            |
| 2000-08-28 Derby        Man City          1                 1          NA           |
| 2000-08-31 Leeds        Chelsea           3                 0          NA           |
| 2000-08-31 Man City     Everton           3                 0          1            |
+-------------------------------------------------------------------------------------+
我想包括一列,显示主队在最近两场客场比赛中的得分总和,即前两行的“得分”列中的值总和,其中“得分”等于各自当前行的“主队”

例如,在下表中,列“HomeTeam”中第一次出现的“Man City”的新列的值为“3”(列“Points_AwayTeam”中前两次出现的“Man City”的值之和,即0+3) 类似地,“HomeTeam”列中第二次出现的“Man City”的新列的值为“1”(1+0)。 其他行的值为“NA”,因为在列“AwayTeam”中没有其他“HomeTeam”出现两次

+--------------------------------------------------------------------------+
| 'Date'    'HomeTeam'   'AwayTeam'  'Points_HomeTeam' 'Points_AwayTeam'   |
+--------------------------------------------------------------------------+
| 2000-08-19 Charlton     Man City          0                 3            |
| 2000-08-19 Chelsea      Arsenal           1                 1            |
| 2000-08-23 Coventry     Man City          3                 0            |
| 2000-08-25 Man City     Liverpool         1                 1            |
| 2000-08-28 Derby        Man City          1                 1            |
| 2000-08-31 Leeds        Chelsea           3                 0            |
| 2000-08-31 Man City     Everton           3                 0            |
+--------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------+
| 'Date'    'HomeTeam'   'AwayTeam'  'Points_HomeTeam' 'Points_AwayTeam' 'New Column' |
+-------------------------------------------------------------------------------------+
| 2000-08-19 Charlton     Man City          0                 3          NA           |
| 2000-08-19 Chelsea      Arsenal           1                 1          NA           |
| 2000-08-23 Coventry     Man City          3                 0          NA           |
| 2000-08-25 Man City     Liverpool         1                 1          3            |
| 2000-08-28 Derby        Man City          1                 1          NA           |
| 2000-08-31 Leeds        Chelsea           3                 0          NA           |
| 2000-08-31 Man City     Everton           3                 0          1            |
+-------------------------------------------------------------------------------------+
我用以下代码计算了“主队”最近两场主场比赛的总积分:

f = lambda x: x.rolling(window = rolling_games, min_periods = rolling_games).sum().shift()
df['HomeTeam_HomePoints'] = df.groupby('HomeTeam')['Points_HomeTeam'].apply(f).reset_index(drop = True, level = 0)
如何根据单独列中的值计算行间的滚动和

非常感谢

这里有一个解决方案:

away = df[["Date", "AwayTeam", "Points_AwayTeam"]].copy()

# Create a rolling sum for the away column. 
away["roll_sum"] = away.groupby("AwayTeam")["Points_AwayTeam"].transform(lambda x: x.rolling(2).sum())
    
# for every match, we now have to find the last rolling sum 
# of 'away' for the 'home' team. 
# 
# We're going to use merge_asof to do that:
# The first step of this function is to match home-teams on the left
# to away teams on the left. (done via left_by and right_by)
# then, for every date on the left, we're looking for the closest 
# (previous) date on the right (this is done by the 'on' argument). 
res=pd.merge_asof(df, away, on= "Date", left_by="HomeTeam", right_by="AwayTeam", suffixes=["", "_roll"])
res.drop(["AwayTeam_roll", "Points_AwayTeam_roll"], axis=1, inplace = True)
print(res)
输出:

        Date  HomeTeam   AwayTeam  Points_HomeTeam  Points_AwayTeam  roll_sum
0 2000-08-19  Charlton   Man-City                0                3       NaN
1 2000-08-19   Chelsea    Arsenal                1                1       NaN
2 2000-08-23  Coventry   Man-City                3                0       NaN
3 2000-08-25  Man-City  Liverpool                1                1       3.0
4 2000-08-28     Derby   Man-City                1                1       NaN
5 2000-08-31     Leeds    Chelsea                3                0       NaN
6 2000-08-31  Man-City    Everton                3                0       1.0
这里有一个解决方案:

away = df[["Date", "AwayTeam", "Points_AwayTeam"]].copy()

# Create a rolling sum for the away column. 
away["roll_sum"] = away.groupby("AwayTeam")["Points_AwayTeam"].transform(lambda x: x.rolling(2).sum())
    
# for every match, we now have to find the last rolling sum 
# of 'away' for the 'home' team. 
# 
# We're going to use merge_asof to do that:
# The first step of this function is to match home-teams on the left
# to away teams on the left. (done via left_by and right_by)
# then, for every date on the left, we're looking for the closest 
# (previous) date on the right (this is done by the 'on' argument). 
res=pd.merge_asof(df, away, on= "Date", left_by="HomeTeam", right_by="AwayTeam", suffixes=["", "_roll"])
res.drop(["AwayTeam_roll", "Points_AwayTeam_roll"], axis=1, inplace = True)
print(res)
输出:

        Date  HomeTeam   AwayTeam  Points_HomeTeam  Points_AwayTeam  roll_sum
0 2000-08-19  Charlton   Man-City                0                3       NaN
1 2000-08-19   Chelsea    Arsenal                1                1       NaN
2 2000-08-23  Coventry   Man-City                3                0       NaN
3 2000-08-25  Man-City  Liverpool                1                1       3.0
4 2000-08-28     Derby   Man-City                1                1       NaN
5 2000-08-31     Leeds    Chelsea                3                0       NaN
6 2000-08-31  Man-City    Everton                3                0       1.0

请您添加一些示例数据以及预期的输出,谢谢您的评论!我编辑了这个问题以提供更多的细节。你能添加一些示例数据以及预期的输出吗?谢谢你的评论!我编辑了问题以提供更多细节。谢谢你的帮助!我得到了以下错误:“ValueError:长度不匹配:预期的axis有7440个元素,新的值有7441个元素”(我的原始数据帧有7441行)。Emm。您是在尝试代码的新版本还是原始版本?还有-哪一行产生错误?(如果你能分享数据,我可以试着在我这边运行)我正在尝试新版本。第二行代码,即groupby-transform,导致了错误。我正在努力弄清楚到底是什么问题。。。如何共享数据?抱歉,这是我第一次在stack OverflowShared上发布数据-只需上传到google drive或类似的东西,然后将链接粘贴到这里。您使用的是什么版本的pandas?我想出来了-您的代码在使用“apply”而不是transform()时工作。再次感谢你的帮助!谢谢你的帮助!我得到了以下错误:“ValueError:长度不匹配:预期的axis有7440个元素,新的值有7441个元素”(我的原始数据帧有7441行)。Emm。您是在尝试代码的新版本还是原始版本?还有-哪一行产生错误?(如果你能分享数据,我可以试着在我这边运行)我正在尝试新版本。第二行代码,即groupby-transform,导致了错误。我正在努力弄清楚到底是什么问题。。。如何共享数据?抱歉,这是我第一次在stack OverflowShared上发布数据-只需上传到google drive或类似的东西,然后将链接粘贴到这里。您使用的是什么版本的pandas?我想出来了-您的代码在使用“apply”而不是transform()时工作。再次感谢你的帮助!