Pandas-如何根据条件对两列中的X个最后值求和

Pandas-如何根据条件对两列中的X个最后值求和,pandas,numpy,data-science,Pandas,Numpy,Data Science,最近我开始学习熊猫。我真的很想找到解决办法,但找不到。问题就在这里 我有一个数据框架:简单的足球数据。 对于每支球队,我想知道他们在前两场比赛中进了多少球;不管他们是主队还是客队。因此,我必须对每个团队的两个不同列中的特定数量的值求和 样本数据: import pandas as pd data = [['2018-02-03', 'manutd', 'chelsea', 3, 1], ['2018-02-08', 'arsenal', 'liverpool', 1, 1],

最近我开始学习熊猫。我真的很想找到解决办法,但找不到。问题就在这里

我有一个数据框架:简单的足球数据。 对于每支球队,我想知道他们在前两场比赛中进了多少球;不管他们是主队还是客队。因此,我必须对每个团队的两个不同列中的特定数量的值求和

样本数据:

import pandas as pd
data = [['2018-02-03', 'manutd', 'chelsea', 3, 1], ['2018-02-08', 'arsenal', 'liverpool', 1, 1], 
        ['2018-01-12', 'chelsea', 'westham', 2, 0], ['2018-01-12', 'liverpool', 'manutd', 0, 2], 
        ['2018-03-15', 'arsenal', 'chelsea', 2, 2], ['2018-02-20', 'manutd', 'brighton', 0, 0], 
        ['2018-04-01', 'westham', 'fulham', 1, 0], ['2018-03-15', 'manutd', 'westham', 2, 1]] 
df = pd.DataFrame(data, columns = ['event_time', 'home_team', 'away_team', 'home_goals', 'away_goals'])
df['event_time'] = pd.to_datetime(df['event_time'])
df.sort_values(['event_time'],inplace=True, ascending=False)
print(df)


  event_date  home_team  away_team  home_goals  away_goals
6 2018-04-01    westham     fulham           1           0
4 2018-03-15    arsenal    chelsea           2           2
7 2018-03-15     manutd    westham           2           1
5 2018-02-20     manutd   brighton           0           0
1 2018-02-08    arsenal  liverpool           1           1
0 2018-02-03     manutd    chelsea           3           1
2 2018-01-12    chelsea    westham           2           0
3 2018-01-12  liverpool     manutd           0           2
我想要达到的目标:

  event_time  home_team  away_team  home_goals  away_goals  h_goals_previous_2  a_goals_previous_2
6 2018-04-01    westham     fulham           1           0                  1                  NaN
4 2018-03-15    arsenal    chelsea           2           2                  1                    3
7 2018-03-15     manutd    westham           2           1                  3                    0
5 2018-02-20     manutd   brighton           0           0                  5                  NaN
1 2018-02-08    arsenal  liverpool           1           1                NaN                    0
0 2018-02-03     manutd    chelsea           3           1                  2                      2
2 2018-01-12    chelsea    westham           2           0                NaN                  NaN
3 2018-01-12  liverpool     manutd           0           2                NaN                  NaN
说明: -2018年3月15日,阿森纳与切尔西交手。在前两场比赛中,切尔西一共进了3个球:客场1个,主场2个。 -之前的一些目标是Nan,因为我们没有之前比赛的数据

我试图通过一个团队一个团队地迭代来实现这一点,对于每个团队,我都在构建df的一个排序子集,然后可以聚合这些值,但我觉得这不是最好的解决方案,可以使用nice表达式来实现:

teams = pd.unique(df[['home_team', 'away_team']].values.ravel('K'))
for team in teams:
    print(team)
    team_df = df[(df['home_team']==team) | (df['away_team']==team)]
    team_df.sort_values(['event_date'],inplace=True, ascending=False)
    print(team_df)
如果不写循环和ifs,我怎么做呢

方法1::

#Create a df2 with index like a column a rename the columns to apply:
# pd.wide_to_long

df2=df.set_index('event_time',append=True)
df2.columns=[''.join(name[::-1]) for name in  df2.columns.str.split('_')]
df2.columns=df2.columns.str.replace('home','1').str.replace('away','2')
df2=df2.reset_index()

#Using pd.wide_to_long
df_long=( pd.wide_to_long(df2,['team','goals'],i='level_0',j='key')
          .sort_values('event_time',ascending=False) )
print(df_long)


            event_time       team  goals
level_0 key                             
6       1   2018-04-01    westham      1
        2   2018-04-01     fulham      0
4       1   2018-03-15    arsenal      2
7       1   2018-03-15     manutd      2
4       2   2018-03-15    chelsea      2
7       2   2018-03-15    westham      1
5       1   2018-02-20     manutd      0
        2   2018-02-20   brighton      0
1       1   2018-02-08    arsenal      1
        2   2018-02-08  liverpool      1
0       1   2018-02-03     manutd      3
        2   2018-02-03    chelsea      1
2       1   2018-01-12    chelsea      2
3       1   2018-01-12  liverpool      0
2       2   2018-01-12    westham      0
3       2   2018-01-12     manutd      2
  event_time  home_team  away_team  home_goals  away_goals  \
6 2018-04-01    westham     fulham           1           0   
4 2018-03-15    arsenal    chelsea           2           2   
7 2018-03-15     manutd    westham           2           1   
5 2018-02-20     manutd   brighton           0           0   
1 2018-02-08    arsenal  liverpool           1           1   
0 2018-02-03     manutd    chelsea           3           1   
2 2018-01-12    chelsea    westham           2           0   
3 2018-01-12  liverpool     manutd           0           2   

   h_goals_previous_2  a_goals_previous_2  
6                 1.0                 NaN  
4                 NaN                 3.0  
7                 3.0                 NaN  
5                 5.0                 NaN  
1                 NaN                 NaN  
0                 NaN                 NaN  
2                 NaN                 NaN  
3                 NaN                 NaN  

方法2:


输出:

#Create a df2 with index like a column a rename the columns to apply:
# pd.wide_to_long

df2=df.set_index('event_time',append=True)
df2.columns=[''.join(name[::-1]) for name in  df2.columns.str.split('_')]
df2.columns=df2.columns.str.replace('home','1').str.replace('away','2')
df2=df2.reset_index()

#Using pd.wide_to_long
df_long=( pd.wide_to_long(df2,['team','goals'],i='level_0',j='key')
          .sort_values('event_time',ascending=False) )
print(df_long)


            event_time       team  goals
level_0 key                             
6       1   2018-04-01    westham      1
        2   2018-04-01     fulham      0
4       1   2018-03-15    arsenal      2
7       1   2018-03-15     manutd      2
4       2   2018-03-15    chelsea      2
7       2   2018-03-15    westham      1
5       1   2018-02-20     manutd      0
        2   2018-02-20   brighton      0
1       1   2018-02-08    arsenal      1
        2   2018-02-08  liverpool      1
0       1   2018-02-03     manutd      3
        2   2018-02-03    chelsea      1
2       1   2018-01-12    chelsea      2
3       1   2018-01-12  liverpool      0
2       2   2018-01-12    westham      0
3       2   2018-01-12     manutd      2
  event_time  home_team  away_team  home_goals  away_goals  \
6 2018-04-01    westham     fulham           1           0   
4 2018-03-15    arsenal    chelsea           2           2   
7 2018-03-15     manutd    westham           2           1   
5 2018-02-20     manutd   brighton           0           0   
1 2018-02-08    arsenal  liverpool           1           1   
0 2018-02-03     manutd    chelsea           3           1   
2 2018-01-12    chelsea    westham           2           0   
3 2018-01-12  liverpool     manutd           0           2   

   h_goals_previous_2  a_goals_previous_2  
6                 1.0                 NaN  
4                 NaN                 3.0  
7                 3.0                 NaN  
5                 5.0                 NaN  
1                 NaN                 NaN  
0                 NaN                 NaN  
2                 NaN                 NaN  
3                 NaN                 NaN  

请注意,存在更多NaN值​​因为我只使用了数据框中显示的行

你能解释一下“不管他们是主队还是客队”的意思吗?如果这是真的,那么为什么你有两个
goals\u previous\u 2
列。一个回家,一个出去。另外,如果你想要更快的反馈,我建议你用期望的输出完成这两列。输出会被更新。因为我想计算主队和客队的进球数,以及他们在前两场比赛中进了多少球。计算说明:-2018年3月15日阿森纳与切尔西比赛。在前两场比赛中,切尔西一共进了3个球:客场1个,主场2个之前的一些进球是Nan,因为我们没有之前比赛的数据。谢谢!这很有效,我学到了很多。我有一个关于为N个以前的匹配参数化它的问题,例如,大数据集的最后8个。可以使用什么来代替
value\u 2\u sum=groups\u goals.shift(-1)+groups\u goals.shift(-2)
?我尝试了使用
组\u goals.shift(-8).rolling(8).sum()
,但无法正确使用groupedby series。如果我在df['goals']上直接使用shift&rolling,那么它的求和是正确的(但这不是我想要实现的),但是在将它用于
组\u目标之后,我得到了“奇怪”的结果-可能是索引的问题。
  event_time  home_team  away_team  home_goals  away_goals  \
6 2018-04-01    westham     fulham           1           0   
4 2018-03-15    arsenal    chelsea           2           2   
7 2018-03-15     manutd    westham           2           1   
5 2018-02-20     manutd   brighton           0           0   
1 2018-02-08    arsenal  liverpool           1           1   
0 2018-02-03     manutd    chelsea           3           1   
2 2018-01-12    chelsea    westham           2           0   
3 2018-01-12  liverpool     manutd           0           2   

   h_goals_previous_2  a_goals_previous_2  
6                 1.0                 NaN  
4                 NaN                 3.0  
7                 3.0                 NaN  
5                 5.0                 NaN  
1                 NaN                 NaN  
0                 NaN                 NaN  
2                 NaN                 NaN  
3                 NaN                 NaN