Python 熊猫数据帧聚合固定数量的行_Python_Sql_Pandas_Pandas Groupby

Python 熊猫数据帧聚合固定数量的行

python sql pandas

Python 熊猫数据帧聚合固定数量的行,python,sql,pandas,pandas-groupby,Python,Sql,Pandas,Pandas Groupby,我正在处理一些数据，这里我想获得每匹马最近跑步的等级（终点位置）（跑步前最多6次）。运行日期定义为'race\u id' 是否有一种方法可以使用groupby和agg而只聚合前面的6个值数据框如下所示： finishing_position horse_id race_id 1 K01 2014011 2 K02 2014011 3 M01

我正在处理一些数据，这里我想获得每匹马最近跑步的等级（

终点位置

）（跑步前最多6次）。运行日期定义为

'race\u id'

是否有一种方法可以使用

groupby

和

agg

而只聚合前面的6个值

数据框如下所示：

finishing_position  horse_id    race_id
 1                  K01         2014011
 2                  K02         2014011
 3                  M01         2014011
 4                  K01         2014012
 2                  K01         2014021
 3                  K01         2014031
 1                  M01         2015011
 2                  K01         2016012
 1                  K02         2016012
 3                  M01         2016012
 4                  J01         2016012

我希望结果是

finishing_position  horse_id    race_id     recent
 1                  K01         2014011
 2                  K02         2014011
 3                  M01         2014011
 4                  K01         2014012     1
 2                  K01         2014021     1/4
 3                  K01         2014031     1/4/2
 1                  M01         2015011     3
 2                  K01         2016012     1/4/2/3
 1                  K02         2016012     2
 3                  M01         2016012     3/1
 4                  J01         2016012

我们可以使用

cumsum

和

groupby

df['recent']=df.finishing_position.astype(str)+'/'
df['recent']=df.groupby('horse_id').recent.apply(lambda x : x.cumsum().shift().str[:-1].fillna(''))
df
Out[140]: 
    finishing_position horse_id  race_id   recent
0                    1      K01  2014011         
1                    2      K02  2014011         
2                    3      M01  2014011         
3                    4      K01  2014012        1
4                    2      K01  2014021      1/4
5                    3      K01  2014031    1/4/2
6                    1      M01  2015011        3
7                    2      K01  2016012  1/4/2/3
8                    1      K02  2016012        2
9                    3      M01  2016012      3/1
10                   4      J01  2016012

对@Wen-answer进行了修订，以使总计仅达到N个以前的记录

df['recent']=df.finishing_position.astype(str)+'/'
df['recent']=df.groupby('horse_id').recent.apply(lambda x : x.cumsum().shift().str[:-1].fillna(''))

def last_n_record(string,recent_no):
    count = string.count('/')
    if count+1 >= recent_no:
       return string.split('/',count - recent_no + 1)[-1]
    else:
       return string

recent_no = 3 # Lets take 3 recent records as demo
df['recent'] = df.recent.apply(lambda x: last_n_record(x,recent_no))
df
    finishing_position horse_id  race_id recent
0                    1      K01  2014011       
1                    2      K02  2014011       
2                    3      M01  2014011       
3                    4      K01  2014012      1
4                    2      K01  2014021    1/4
5                    3      K01  2014031  1/4/2
6                    1      M01  2015011      3
7                    2      K01  2016012  4/2/3
8                    1      K02  2016012      2
9                    3      M01  2016012    3/1
10                   4      J01  2016012

谢谢，但在何处使累积数最多只能聚合到之前的6条记录？使用以下方法：选择*，行号（）覆盖（按马划分\u id顺序按种族\u id描述）racesback，然后过滤racesback，您可以如何操作want@goodBOB似乎您需要滚动求和，但这与您的预期不符put@Wen对我尝试了滚动求和，它将所有6个值相加。这是我的解决方案，我可以在cumsum之后去掉新df上的额外等级。