Python 连接数据帧并计算与日期的距离
给定 我想以某种方式加入这些数据帧,以便df有一个新列“days\u since\u event” 像这样Python 连接数据帧并计算与日期的距离,python,pandas,Python,Pandas,给定 我想以某种方式加入这些数据帧,以便df有一个新列“days\u since\u event” 像这样 print(df) t date 2021-01-01 0 2021-01-02 1 2021-01-03 2 2021-01-04 3 2021-01-05 4 2021-01-06 5 2021-01-07 6 2021-01-08 7 2021-01-09 8 2021-01-10 9 2021-0
print(df)
t
date
2021-01-01 0
2021-01-02 1
2021-01-03 2
2021-01-04 3
2021-01-05 4
2021-01-06 5
2021-01-07 6
2021-01-08 7
2021-01-09 8
2021-01-10 9
2021-01-11 10
2021-01-12 11
2021-01-13 12
2021-01-14 13
2021-01-15 14
print(events)
Empty DataFrame
Columns: []
Index: [2021-01-05 00:00:00, 2021-01-12 00:00:00]
我没有看到任何明显的矢量化方法
我想也许用-1来建立一个列,在该列上做一个反向累积和,再加上一些其他的魔法,但我还没有想出一个解决方案
编辑1:
我有一个基于
但这让我头疼。我不太喜欢
df.index>events.index[0]
部分。也许有一个我缺少的更好的解决方案您可以标记事件日期,创建伪事件组,然后创建序列的组。这让我们几乎达到了目的:
df['event']=df.index.isin(events.index)
df['days\u since\u event']=df.event.groupby(df.event.cumsum()).cumcount()
#事件发生后的事件天数
#日期
#2021-01-01 0假0
#2021-01-02 1假1
#2021-01-03 2假2
#2021-01-04 3假3
#2021-01-05 4真实的0
#2021-01-06 5假1
#2021-01-07 6假2
#2021-01-08 7假3
#2021-01-09 8假4
#2021-01-10 9假5
#2021-01-11 10假6
#2021-01-12 11真实0
#2021-01-13 12假1
#2021-01-14 13假2
#2021-01-15 14假3
然后确定第一个事件之前的日期:
event1=df.event.argmax()
df.at[df.index[:event1+1],'days\u since\u event']=范围(-event1,1)
#事件发生后的t天
#日期
# 2021-01-01 0 -4
# 2021-01-02 1 -3
# 2021-01-03 2 -2
# 2021-01-04 3 -1
# 2021-01-05 4 0
# 2021-01-06 5 1
# 2021-01-07 6 2
# 2021-01-08 7 3
# 2021-01-09 8 4
# 2021-01-10 9 5
# 2021-01-11 10 6
# 2021-01-12 11 0
# 2021-01-13 12 1
# 2021-01-14 13 2
# 2021-01-15 14 3
tdy的答案肯定是一个很好的解决方案,如果数据与样本中的数据完全相同,那么如果每天都有一行
就我个人而言,我更愿意这样做:
df['reset'] = 0
df['val'] = -1
df.loc[df.index > events.index[0], 'val'] = 1
df.loc[df.index.isin(events.index), 'val'] = 0
df.loc[df.index.isin(events.index), 'reset'] = 1
df['cumsum'] = df['reset'].cumsum()
df['days_since_event'] = df.groupby(['cumsum'])['val'].cumsum()
df.drop(['reset', 'cumsum', 'val'], axis=1, inplace=True)
然后在事件列中设置日期,并进行简单的ffill和bfill(按该顺序)
完成以下所有工作:
df.loc[df["date"].isin(event), "event"] = event
df
date event
0 2021-01-01 None
1 2021-01-02 None
2 2021-01-03 None
3 2021-01-04 None
4 2021-01-05 2021-01-05 00:00:00
5 2021-01-06 None
6 2021-01-07 None
7 2021-01-08 None
8 2021-01-09 None
9 2021-01-10 None
10 2021-01-11 None
11 2021-01-12 2021-01-12 00:00:00
12 2021-01-13 None
13 2021-01-14 None
14 2021-01-15 None
df["event"] = df["event"].ffill().bfill()
df
date event
0 2021-01-01 2021-01-05
1 2021-01-02 2021-01-05
2 2021-01-03 2021-01-05
3 2021-01-04 2021-01-05
4 2021-01-05 2021-01-05
5 2021-01-06 2021-01-05
6 2021-01-07 2021-01-05
7 2021-01-08 2021-01-05
8 2021-01-09 2021-01-05
9 2021-01-10 2021-01-05
10 2021-01-11 2021-01-05
11 2021-01-12 2021-01-12
12 2021-01-13 2021-01-12
13 2021-01-14 2021-01-12
14 2021-01-15 2021-01-12
清理(如果需要,可以更改为整数):
这太完美了。非常清楚的解决方案。谢谢
df['reset'] = 0
df['val'] = -1
df.loc[df.index > events.index[0], 'val'] = 1
df.loc[df.index.isin(events.index), 'val'] = 0
df.loc[df.index.isin(events.index), 'reset'] = 1
df['cumsum'] = df['reset'].cumsum()
df['days_since_event'] = df.groupby(['cumsum'])['val'].cumsum()
df.drop(['reset', 'cumsum', 'val'], axis=1, inplace=True)
df = DF(dict(date= [to_datetime("20210101") + to_timedelta(i, unit= "D") for i in range(15)]))
df["event"] = None
df
date event
0 2021-01-01 None
1 2021-01-02 None
2 2021-01-03 None
3 2021-01-04 None
4 2021-01-05 None
5 2021-01-06 None
6 2021-01-07 None
7 2021-01-08 None
8 2021-01-09 None
9 2021-01-10 None
10 2021-01-11 None
11 2021-01-12 None
12 2021-01-13 None
13 2021-01-14 None
14 2021-01-15 None
# Set events
event = [to_datetime("20210105"), to_datetime("20210112")]
df.loc[df["date"].isin(event), "event"] = event
df
date event
0 2021-01-01 None
1 2021-01-02 None
2 2021-01-03 None
3 2021-01-04 None
4 2021-01-05 2021-01-05 00:00:00
5 2021-01-06 None
6 2021-01-07 None
7 2021-01-08 None
8 2021-01-09 None
9 2021-01-10 None
10 2021-01-11 None
11 2021-01-12 2021-01-12 00:00:00
12 2021-01-13 None
13 2021-01-14 None
14 2021-01-15 None
df["event"] = df["event"].ffill().bfill()
df
date event
0 2021-01-01 2021-01-05
1 2021-01-02 2021-01-05
2 2021-01-03 2021-01-05
3 2021-01-04 2021-01-05
4 2021-01-05 2021-01-05
5 2021-01-06 2021-01-05
6 2021-01-07 2021-01-05
7 2021-01-08 2021-01-05
8 2021-01-09 2021-01-05
9 2021-01-10 2021-01-05
10 2021-01-11 2021-01-05
11 2021-01-12 2021-01-12
12 2021-01-13 2021-01-12
13 2021-01-14 2021-01-12
14 2021-01-15 2021-01-12
df["days_since"] = df["date"] - df["event"]
df
date event days_since
0 2021-01-01 2021-01-05 -4 days
1 2021-01-02 2021-01-05 -3 days
2 2021-01-03 2021-01-05 -2 days
3 2021-01-04 2021-01-05 -1 days
4 2021-01-05 2021-01-05 0 days
5 2021-01-06 2021-01-05 1 days
6 2021-01-07 2021-01-05 2 days
7 2021-01-08 2021-01-05 3 days
8 2021-01-09 2021-01-05 4 days
9 2021-01-10 2021-01-05 5 days
10 2021-01-11 2021-01-05 6 days
11 2021-01-12 2021-01-12 0 days
12 2021-01-13 2021-01-12 1 days
13 2021-01-14 2021-01-12 2 days
14 2021-01-15 2021-01-12 3 days
del df["event"]; df["days_since"] = df["days_since"].dt.days
df
date days_since
0 2021-01-01 -4
1 2021-01-02 -3
2 2021-01-03 -2
3 2021-01-04 -1
4 2021-01-05 0
5 2021-01-06 1
6 2021-01-07 2
7 2021-01-08 3
8 2021-01-09 4
9 2021-01-10 5
10 2021-01-11 6
11 2021-01-12 0
12 2021-01-13 1
13 2021-01-14 2
14 2021-01-15 3