Python 找到与之相等的运行平均值_Python_Pandas_Numpy_Rolling Average

Python 找到与之相等的运行平均值

python pandas numpy

Python 找到与之相等的运行平均值,python,pandas,numpy,rolling-average,Python,Pandas,Numpy,Rolling Average,假设我有一些与足球有关的数据 Date Home Away HomeGoal AwayGoal TotalGoal 2019 Arsenal MU 5 1 6 2019 MCity Liv 2 2 4 2019 MU Liv 3 4 7 2019 MCity MU 0 0 0 我想创建一列数据，显示该

假设我有一些与足球有关的数据

Date   Home     Away  HomeGoal AwayGoal TotalGoal
2019   Arsenal  MU     5        1        6
2019   MCity    Liv    2        2        4
2019   MU       Liv    3        4        7
2019   MCity    MU     0        0        0

我想创建一列数据，显示该队最近两场比赛的平均进球数。例如，在最后一行中，我想包括一列，显示MU在过去2场比赛中的平均目标，即=（1+3）/2=2

python中有什么函数可以实现这一点吗

试着这样做：

根据各自的

主页

和

客场

目标分为两个数据帧

df1=df[['Date','Home','HomeGoal']]
df2 = df[['Date','Away','AwayGoal']]

all_dfs=[df1,df2]

列的名称

for dfs in all_dfs:
    dfs.columns = ['Date','Team', 'Goal']

将两个dfs连接在一起

new_df=pd.concat(all_dfs,ignore_index=True).reset_index(drop=True)

输出：最近两场比赛的平均成绩：

new_df[new_df['Team'] == 'MU'].sort_values('Date')['Goal'][:2].sum()/2

在客场和主场比赛中，球队的总进球数

new_df.groupby('Team')['Goal'].sum()

输出：

根据你的要求，你不在乎一支球队是主场还是客场，只在乎每场比赛进多少球。试试这个：

# Rename the columns to make the unstacking operation a bit easier
# Always a good idea to specify an explicit `copy` when you intend
# to change the dataframe structure
>>> tmp = df[['Home', 'Away', 'HomeGoal', 'AwayGoal']].copy()

# Arrange the columns into a MultiIndex to make stacking easier
>>> tmp.columns = pd.MultiIndex.from_product([['Team', 'Goal'], ['Home', 'Away']])

# This is what `tmp` look like:

           Team      Goal     
      Home Away Home Away
0  Arsenal   MU    5    1
1    MCity  Liv    2    2
2       MU  Liv    3    4
3    MCity   MU    0    0

# And now the magic
>>> tmp.stack() \
        .groupby('Team').rolling(2).mean() \
        .groupby('Team').tail(1) \
        .droplevel([1,2])

# Result
         Goal
Team         
Arsenal   NaN
Liv       3.0
MCity     1.0
MU        1.5

下面是它的工作原理：

```
stack
```
unpivots
```
Home
```
和
```
Away
```
这样，对于每场比赛，我们为
```
团队
```
和
```
目标
```
```
groupby（'Team'）。滚动（2）。平均值（）
```
获取每个队在过去2场比赛中得分的滚动平均值
```
groupby（'Team'）。tail（1）
```
获取每个团队的最后一个滚动平均值
此时，过渡数据框的索引中有3个级别：球队名称、比赛编号和上一场比赛的主客场指标。我们只关心第一个，所以我们将放弃另外两个

所以球队是主场还是客场都无所谓了？最后一排不是1.5吗？因为MU的分数是0，之前是3？3/2=1.5Cud u显示一个具有预期输出的数据帧？@MattR没关系，我打算找出该队最近N场比赛的平均进球数。因为最后一行是结果数据，所以我不包括它来训练模型。因此，MU的平均目标应该是2是正确的。@sammywemmy我的预期输出应该是目标的实际数量。我现在不在家，但是这些数据包括比赛中的其他足球数据。创建两个数据框，分别以

主场

和

主场

作为列，另一个以

客场

和

客场

作为列，然后将它们与新列

球队

和

进球

堆叠在一起，然后使用

分组（球队）[Goal].sum（）

要计算总目标，您如何获得OP要求的最近两场比赛的平均值？这个问题值得回答吗？

Team
Arsenal    5
Liv        6
MU         4
Mcity      2

# Rename the columns to make the unstacking operation a bit easier
# Always a good idea to specify an explicit `copy` when you intend
# to change the dataframe structure
>>> tmp = df[['Home', 'Away', 'HomeGoal', 'AwayGoal']].copy()

# Arrange the columns into a MultiIndex to make stacking easier
>>> tmp.columns = pd.MultiIndex.from_product([['Team', 'Goal'], ['Home', 'Away']])

# This is what `tmp` look like:

           Team      Goal     
      Home Away Home Away
0  Arsenal   MU    5    1
1    MCity  Liv    2    2
2       MU  Liv    3    4
3    MCity   MU    0    0

# And now the magic
>>> tmp.stack() \
        .groupby('Team').rolling(2).mean() \
        .groupby('Team').tail(1) \
        .droplevel([1,2])

# Result
         Goal
Team         
Arsenal   NaN
Liv       3.0
MCity     1.0
MU        1.5