Pandas 过去n天的滚动计数

Pandas 过去n天的滚动计数,pandas,rolling-computation,Pandas,Rolling Computation,我有以下数据帧: entry_time_flat route_id time_slot 2019-09-02 00:00:00 1_2 0-6 2019-09-04 00:00:00 3_4 6-12 2019-09-06 00:00:00 1_2 0-6 2019-09-06 00:00:00 1_

我有以下数据帧:

entry_time_flat           route_id      time_slot          

2019-09-02 00:00:00           1_2            0-6
2019-09-04 00:00:00           3_4            6-12
2019-09-06 00:00:00           1_2            0-6
2019-09-06 00:00:00           1_2           18-20
...
我想创建一个最终的_df,对于每个路由id和时间槽,计算过去n天内发生的次数,其中n天=30

为了举例说明,我想获得以下df:

print(final_df)

entry_time_flat           route_id      time_slot    n_occurrences        

2019-09-02 00:00:00           1            0-6             0
2019-09-04 00:00:00           3            6-12            0
2019-09-06 00:00:00           1            0-6             1
2019-09-06 00:00:00           1            18-20           0
...
如何有效地实现该结果?

您可以使用pd.DataFrame.rolling和偏移量:

# set date column as index, make sure it is sorted
df.set_index('entry_time_flat',inplace=True)
df.sort_index(inplace=True)

# define offset
n_days = 30
offset = str(n_days)+'D'

# count
final_df = df.groupby(['route_id','time_slot'])['route_id'].rolling(offset,closed='left').count()
final_df.fillna(0,inplace=True)

# get desired output format
final_df.name = 'n_occurrences'
final_df = final_df.reset_index()

编辑:看起来您希望间隔保持关闭状态。相应地更改了答案。

这背后的逻辑是什么?按时隙和路由id分组,并计算每个子集的行数['n']=df.groupby['route\u id','time\u slot']。cumcount