Python 滚动函数加值_Python_Python 3.x_Pandas_Pandas Groupby

Python 滚动函数加值

python python-3.x pandas

Python 滚动函数加值,python,python-3.x,pandas,pandas-groupby,Python,Python 3.x,Pandas,Pandas Groupby,我有一个相当标准的函数，它似乎会产生非常奇怪的响应；我以为我已经知道发生了什么，但现在我不太确定本质上，我想使用滚动函数来创建之前两个值的简单滚动平均值。当我直接这样做时，它似乎从帧中的其他位置提取第一个数字的值，当我在循环中这样做时，我不知道它来自哪里样本数据： player game_id game_order TOI_comp G_comp A.J..GREER 2016020227 37 16.566667 0 2016020251 36 11.733333

我有一个相当标准的函数，它似乎会产生非常奇怪的响应；我以为我已经知道发生了什么，但现在我不太确定

本质上，我想使用滚动函数来创建之前两个值的简单滚动平均值。当我直接这样做时，它似乎从帧中的其他位置提取第一个数字的值，当我在循环中这样做时，我不知道它来自哪里

样本数据：

player  game_id game_order  TOI_comp    G_comp

A.J..GREER  2016020227  37  16.566667   0
2016020251  36  11.733333   0
2016020268  35  12.700000   0
2016020278  34  15.433333   0
2016020296  33  11.850000   0

player_avgs_base.sort_values(by=['player','game_order'],ascending=False, inplace=True)

avgtoi = player_avgs_base["TOI_comp"].rolling(2).mean().shift()
avgtoi

player         game_id     game_order
ZENON.KONOPKA  2013021047  2                   NaN
A.J..GREER     2016020268  35                  NaN
               2016020278  34             9.308333
               2016020296  33            14.066667
               2017020134  32            13.641667
               2017020149  31            10.108333
               2017020165  30             7.175000
               2017020194  29             6.100000

我本以为会更像这样

player         game_id     game_order
    A.J..GREER     2016020251  36                  NaN
                   2016020268  35                  NaN
                   2016020278  34                12.22 
                   2016020296  33            14.066667
                   2017020134  32            13.641667
                   2017020149  31            10.108333

我认为这是一种问题。如果这能解决您的问题，请尝试：

player_avgs_base.sort_values(["player","game_order"], ascending=False, inplace=True)

如果愿意，可以在执行排序后设置索引

另一点是，对于代码，滚动不尊重分组。我猜你想计算每个玩家的滚动总和，对吧，而不是混入其他玩家的价值。如果是，您可以使用以下代码：

df2= df.sort_values(["player",'game_id',"game_order"])
df2['TOI_comp_avg_lt']= df2.groupby('player')['TOI_comp'].apply(lambda ser: ser.rolling(2).mean().shift())

这将产生：

          player     game_id  game_order   TOI_comp  G_comp  TOI_comp_avg_lt
0     A.J..GREER  2016020227          37  16.566667       0              NaN
2     A.J..GREER  2016020251          36  11.733333       0              NaN
4     A.J..GREER  2016020268          35  12.700000       0        14.150000
6     A.J..GREER  2016020278          34  15.433333       0        12.216666
7     A.J..GREER  2016020296          33  11.850000       0        14.066666
1  ZENON.KONOPKA  2013021047          34  12.666666       0              NaN
5  ZENON.KONOPKA  2013021047          35  14.722222       0              NaN
3  ZENON.KONOPKA  2013021047          37  13.111111       0        13.694444

对于以下测试数据：

import pandas as pd
import io

raw= """A.J..GREER     2016020227  37  16.566667   0
ZENON.KONOPKA  2013021047  34  12.666666   0
A.J..GREER     2016020251  36  11.733333   0
ZENON.KONOPKA  2013021047  37  13.111111   0
A.J..GREER     2016020268  35  12.700000   0
ZENON.KONOPKA  2013021047  35  14.722222   0
A.J..GREER     2016020278  34  15.433333   0
A.J..GREER     2016020296  33  11.850000   0"""

df= pd.read_csv(io.StringIO(raw), sep='\s+', names=['player', 'game_id', 'game_order', 'TOI_comp', 'G_comp'])

顺便说一句，你的

set\u索引

不能代替排序。索引对输出没有影响。例如，如果您使用上述定义的

df

，并执行：

df_indexed= df.set_index(["player",'game_id',"game_order"]) 
df_indexed_result= df_indexed.copy()
df_indexed_result['TOI_comp_shifted']= df_indexed["TOI_comp"].shift()
df_indexed_result['TOI_comp_rolling_mean']= df_indexed["TOI_comp"].rolling(2).mean().shift()

你会得到：

                                      TOI_comp  G_comp  TOI_comp_shifted  TOI_comp_rolling_mean
player        game_id    game_order                                                            
A.J..GREER    2016020227 37          16.566667       0               NaN                    NaN
ZENON.KONOPKA 2013021047 34          12.666666       0         16.566667                    NaN
A.J..GREER    2016020251 36          11.733333       0         12.666666              14.616667
ZENON.KONOPKA 2013021047 37          13.111111       0         11.733333              12.200000
A.J..GREER    2016020268 35          12.700000       0         13.111111              12.422222
ZENON.KONOPKA 2013021047 35          14.722222       0         12.700000              12.905555
A.J..GREER    2016020278 34          15.433333       0         14.722222              13.711111
              2016020296 33          11.850000       0         15.433333              15.077777

如果查看

TOI\u comp\u shifted

列，您会发现，它只是填充了上一列的值，无论它属于哪个

player

（滚动平均值也是如此）。因此，索引对此操作没有影响

关于你的第二个问题。我认为循环应该是这样工作的，如果您的数据帧的列名正常：

group_obj= df2.groupby('player')
for col in ['TOI_comp', 'G_comp']:
    df2[f'{col}_lt']= group_obj[col].apply(lambda ser: ser.rolling(2).mean().shift())

假设你想以同样的方式将滚动平均值应用于列列表。

可能需要注意的是，这些第一段是索引，设置为player_avgs_base。set_index（[“player”，“game_id”，“game_order”，“inplace”]，“inplace=True）我忘记了包括我已经完成了player_avgs_base。排序_值（按=[“player”，“game_order”]，“game_order”]，升序=False，inplace=True）在顶部，我现在要补充这一点。完全同意问题在于它不“尊重我的分组”（顺便说一句，这是一个很好的表达方式）。我将尝试lambda部分。谢谢你提供的信息。每个

game\u id

的

game\u顺序

是否唯一，因此每个

game\u顺序

只标识一个游戏？我在上面相应地改变了顺序。游戏顺序对玩家来说是独一无二的。所以每个玩家都有一个游戏1，这是最新的，直到游戏，代表他们玩的总游戏数。因此，该函数旨在为最后两个游戏生成。目前我正在尝试apply（lambda ser:ser.rolling（2）.mean（）.shift（））也在帧本身上尝试了player_avgs_base.groupby（'player'）.apply（lambda _df:_df.sort_value（by=['game_order']）.TOI_comp rolling（2.mean（）.shift（）），这似乎有效（目前正在验证）。将其添加到循环失败。同样的挑战是需要循环，因为我要在大约100列中生成循环，所以将函数写出100次是非常不符合pythonic的。谢谢。听起来好像有多个同名列。您是否已经检查了

df.dtypes

的输出？如果列没有问题，您可以尝试上面的代码（参见最后的视图行），以防您想对列列表应用相同的逻辑。