Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/solr/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 连接数据帧并计算与日期的距离_Python_Pandas - Fatal编程技术网

Python 连接数据帧并计算与日期的距离

Python 连接数据帧并计算与日期的距离,python,pandas,Python,Pandas,给定 我想以某种方式加入这些数据帧,以便df有一个新列“days\u since\u event” 像这样 print(df) t date 2021-01-01 0 2021-01-02 1 2021-01-03 2 2021-01-04 3 2021-01-05 4 2021-01-06 5 2021-01-07 6 2021-01-08 7 2021-01-09 8 2021-01-10 9 2021-0

给定

我想以某种方式加入这些数据帧,以便df有一个新列“days\u since\u event”

像这样

print(df)
             t
date          
2021-01-01   0
2021-01-02   1
2021-01-03   2
2021-01-04   3
2021-01-05   4
2021-01-06   5
2021-01-07   6
2021-01-08   7
2021-01-09   8
2021-01-10   9
2021-01-11  10
2021-01-12  11
2021-01-13  12
2021-01-14  13
2021-01-15  14
          

print(events)
Empty DataFrame
Columns: []
Index: [2021-01-05 00:00:00, 2021-01-12 00:00:00]
我没有看到任何明显的矢量化方法

我想也许用-1来建立一个列,在该列上做一个反向累积和,再加上一些其他的魔法,但我还没有想出一个解决方案

编辑1: 我有一个基于


但这让我头疼。我不太喜欢
df.index>events.index[0]
部分。也许有一个我缺少的更好的解决方案

您可以标记事件日期,创建伪事件组,然后创建序列的组。这让我们几乎达到了目的:

df['event']=df.index.isin(events.index)
df['days\u since\u event']=df.event.groupby(df.event.cumsum()).cumcount()
#事件发生后的事件天数
#日期
#2021-01-01 0假0
#2021-01-02 1假1
#2021-01-03 2假2
#2021-01-04 3假3
#2021-01-05 4真实的0
#2021-01-06 5假1
#2021-01-07 6假2
#2021-01-08 7假3
#2021-01-09 8假4
#2021-01-10 9假5
#2021-01-11 10假6
#2021-01-12 11真实0
#2021-01-13 12假1
#2021-01-14 13假2
#2021-01-15 14假3
然后确定第一个事件之前的日期:

event1=df.event.argmax()
df.at[df.index[:event1+1],'days\u since\u event']=范围(-event1,1)
#事件发生后的t天
#日期
# 2021-01-01   0                -4
# 2021-01-02   1                -3
# 2021-01-03   2                -2
# 2021-01-04   3                -1
# 2021-01-05   4                 0
# 2021-01-06   5                 1
# 2021-01-07   6                 2
# 2021-01-08   7                 3
# 2021-01-09   8                 4
# 2021-01-10   9                 5
# 2021-01-11  10                 6
# 2021-01-12  11                 0
# 2021-01-13  12                 1
# 2021-01-14  13                 2
# 2021-01-15  14                 3

tdy的答案肯定是一个很好的解决方案,如果数据与样本中的数据完全相同,那么如果每天都有一行

就我个人而言,我更愿意这样做:

df['reset'] = 0
df['val'] = -1
df.loc[df.index > events.index[0], 'val'] = 1
df.loc[df.index.isin(events.index), 'val'] = 0
df.loc[df.index.isin(events.index), 'reset'] = 1
df['cumsum'] = df['reset'].cumsum()
df['days_since_event'] = df.groupby(['cumsum'])['val'].cumsum()
df.drop(['reset', 'cumsum', 'val'], axis=1, inplace=True)
然后在事件列中设置日期,并进行简单的ffill和bfill(按该顺序)

完成以下所有工作:

df.loc[df["date"].isin(event), "event"] = event
df
         date                event
0  2021-01-01                 None
1  2021-01-02                 None
2  2021-01-03                 None
3  2021-01-04                 None
4  2021-01-05  2021-01-05 00:00:00
5  2021-01-06                 None
6  2021-01-07                 None
7  2021-01-08                 None
8  2021-01-09                 None
9  2021-01-10                 None
10 2021-01-11                 None
11 2021-01-12  2021-01-12 00:00:00
12 2021-01-13                 None
13 2021-01-14                 None
14 2021-01-15                 None


df["event"] = df["event"].ffill().bfill()
df
         date      event
0  2021-01-01 2021-01-05
1  2021-01-02 2021-01-05
2  2021-01-03 2021-01-05
3  2021-01-04 2021-01-05
4  2021-01-05 2021-01-05
5  2021-01-06 2021-01-05
6  2021-01-07 2021-01-05
7  2021-01-08 2021-01-05
8  2021-01-09 2021-01-05
9  2021-01-10 2021-01-05
10 2021-01-11 2021-01-05
11 2021-01-12 2021-01-12
12 2021-01-13 2021-01-12
13 2021-01-14 2021-01-12
14 2021-01-15 2021-01-12
清理(如果需要,可以更改为整数):


这太完美了。非常清楚的解决方案。谢谢
df['reset'] = 0
df['val'] = -1
df.loc[df.index > events.index[0], 'val'] = 1
df.loc[df.index.isin(events.index), 'val'] = 0
df.loc[df.index.isin(events.index), 'reset'] = 1
df['cumsum'] = df['reset'].cumsum()
df['days_since_event'] = df.groupby(['cumsum'])['val'].cumsum()
df.drop(['reset', 'cumsum', 'val'], axis=1, inplace=True)
df = DF(dict(date= [to_datetime("20210101") + to_timedelta(i, unit= "D") for i in range(15)]))
df["event"] = None
df
         date event
0  2021-01-01  None
1  2021-01-02  None
2  2021-01-03  None
3  2021-01-04  None
4  2021-01-05  None
5  2021-01-06  None
6  2021-01-07  None
7  2021-01-08  None
8  2021-01-09  None
9  2021-01-10  None
10 2021-01-11  None
11 2021-01-12  None
12 2021-01-13  None
13 2021-01-14  None
14 2021-01-15  None

# Set events
event = [to_datetime("20210105"), to_datetime("20210112")]

df.loc[df["date"].isin(event), "event"] = event
df
         date                event
0  2021-01-01                 None
1  2021-01-02                 None
2  2021-01-03                 None
3  2021-01-04                 None
4  2021-01-05  2021-01-05 00:00:00
5  2021-01-06                 None
6  2021-01-07                 None
7  2021-01-08                 None
8  2021-01-09                 None
9  2021-01-10                 None
10 2021-01-11                 None
11 2021-01-12  2021-01-12 00:00:00
12 2021-01-13                 None
13 2021-01-14                 None
14 2021-01-15                 None


df["event"] = df["event"].ffill().bfill()
df
         date      event
0  2021-01-01 2021-01-05
1  2021-01-02 2021-01-05
2  2021-01-03 2021-01-05
3  2021-01-04 2021-01-05
4  2021-01-05 2021-01-05
5  2021-01-06 2021-01-05
6  2021-01-07 2021-01-05
7  2021-01-08 2021-01-05
8  2021-01-09 2021-01-05
9  2021-01-10 2021-01-05
10 2021-01-11 2021-01-05
11 2021-01-12 2021-01-12
12 2021-01-13 2021-01-12
13 2021-01-14 2021-01-12
14 2021-01-15 2021-01-12
df["days_since"] = df["date"] - df["event"]
df
         date      event days_since
0  2021-01-01 2021-01-05    -4 days
1  2021-01-02 2021-01-05    -3 days
2  2021-01-03 2021-01-05    -2 days
3  2021-01-04 2021-01-05    -1 days
4  2021-01-05 2021-01-05     0 days
5  2021-01-06 2021-01-05     1 days
6  2021-01-07 2021-01-05     2 days
7  2021-01-08 2021-01-05     3 days
8  2021-01-09 2021-01-05     4 days
9  2021-01-10 2021-01-05     5 days
10 2021-01-11 2021-01-05     6 days
11 2021-01-12 2021-01-12     0 days
12 2021-01-13 2021-01-12     1 days
13 2021-01-14 2021-01-12     2 days
14 2021-01-15 2021-01-12     3 days
del df["event"]; df["days_since"] = df["days_since"].dt.days
df
         date  days_since
0  2021-01-01          -4
1  2021-01-02          -3
2  2021-01-03          -2
3  2021-01-04          -1
4  2021-01-05           0
5  2021-01-06           1
6  2021-01-07           2
7  2021-01-08           3
8  2021-01-09           4
9  2021-01-10           5
10 2021-01-11           6
11 2021-01-12           0
12 2021-01-13           1
13 2021-01-14           2
14 2021-01-15           3