Python 3.x 基于2个变量约束计算数据帧中的滚动和
我想创建一个变量:sumofPreious5OccurnceSatidlevel,它是Var1在ID级别(第1列)的前5个值(根据日期变量)的总和,否则它将取NA值 样本数据和输出:Python 3.x 基于2个变量约束计算数据帧中的滚动和,python-3.x,pandas,Python 3.x,Pandas,我想创建一个变量:sumofPreious5OccurnceSatidlevel,它是Var1在ID级别(第1列)的前5个值(根据日期变量)的总和,否则它将取NA值 样本数据和输出: ID Date Var1 SumOfPrevious5OccurencesAtIDLevel 1 1/1/2018 0 NA 1 1/2/2018 1 NA 1 1/3/2018 2 NA 1 1/4/2018 3 NA 2 1/1/2018
ID Date Var1 SumOfPrevious5OccurencesAtIDLevel
1 1/1/2018 0 NA
1 1/2/2018 1 NA
1 1/3/2018 2 NA
1 1/4/2018 3 NA
2 1/1/2018 4 NA
2 1/2/2018 5 NA
2 1/3/2018 6 NA
2 1/4/2018 7 NA
2 1/5/2018 8 NA
2 1/6/2018 9 30
2 1/7/2018 10 35
2 1/8/2018 11 40
与和功能一起使用,以及:
如果数据没有按ID和日期排序,那么?df['new']=df.sort_values(['ID','Date']).groupby('ID')['Var1']).transform(lambda x:x.rolling(5.sum().shift())@user3643528-很好,然后需要将列转换为datetime,并按照编辑后的答案进行排序。
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
#if not sorted ID with datetimes
df = df.sort_values(['ID','Date'])
df['new'] = df.groupby('ID')['Var1'].transform(lambda x: x.rolling(5).sum().shift())
print (df)
ID Date Var1 SumOfPrevious5OccurencesAtIDLevel new
0 1 2018-01-01 0 NaN NaN
1 1 2018-01-02 1 NaN NaN
2 1 2018-01-03 2 NaN NaN
3 1 2018-01-04 3 NaN NaN
4 2 2018-01-01 4 NaN NaN
5 2 2018-01-02 5 NaN NaN
6 2 2018-01-03 6 NaN NaN
7 2 2018-01-04 7 NaN NaN
8 2 2018-01-05 8 NaN NaN
9 2 2018-01-06 9 30.0 30.0
10 2 2018-01-07 10 35.0 35.0
11 2 2018-01-08 11 40.0 40.0