Python 两个时间戳之间存在差异的列

Python 两个时间戳之间存在差异的列,python,pandas,Python,Pandas,我有一个pandas数据帧,看起来像这样: userID timestamp other_data 1 2017-06-19 17:14:00.000 foo 1 2017-06-19 19:16:00.000 bar 1 2017-06-19 23:26:00.000 ter 1 2017-06-20 01:16:00.000 lol 2 2017-

我有一个
pandas数据帧
,看起来像这样:

userID     timestamp                 other_data
1          2017-06-19 17:14:00.000   foo
1          2017-06-19 19:16:00.000   bar
1          2017-06-19 23:26:00.000   ter
1          2017-06-20 01:16:00.000   lol
2          2017-06-20 12:00:00.000   ter
2          2017-06-20 13:15:00.000   foo
2          2017-06-20 17:15:00.000   bar
我想添加两列,
time\u since\u previous\u point
time\u until\u next\u point
,但当然只能在每个用户的点之间添加。我现在并不真正关心单位/格式(只要我能在它们之间轻松切换):

我该怎么做?(空单元格可以是
empty
NaN
None
,这取决于您认为最好的,知道接下来,我将对上一次以来的
时间和下一次之前的
时间进行描述性统计)


请注意,在这里,我将
userID
表示为一列,但实际上,我唯一的识别用户的方法是列的组合(
country
+
userID

我认为您缺少的是一个pandas函数,答案是:

将两者结合在一起,您可以这样做:

from io import StringIO
import pandas as pd
csv = """userID,timestamp,other_data
1,2017-06-19 17:14:00.000,foo
1,2017-06-19 19:16:00.000,bar
1,2017-06-19 23:26:00.000,ter
1,2017-06-20 01:16:00.000,lol
2,2017-06-20 12:00:00.000,ter
2,2017-06-20 13:15:00.000,foo
2,2017-06-20 17:15:00.000,bar
"""

df = pd.read_csv(StringIO(csv))
给出:

   userID                timestamp other_data
0       1  2017-06-19 17:14:00.000        foo
1       1  2017-06-19 19:16:00.000        bar
2       1  2017-06-19 23:26:00.000        ter
3       1  2017-06-20 01:16:00.000        lol
4       2  2017-06-20 12:00:00.000        ter
5       2  2017-06-20 13:15:00.000        foo
6       2  2017-06-20 17:15:00.000        bar
首先,您需要将
时间戳
转换为
日期时间
列:

df['timestamp'] = pd.to_datetime(df.timestamp)
然后结合
groupby
shift
方法:

df['time_since_previous'] = df['timestamp'] - df.groupby('userID')['timestamp'].shift(1)
df['time_until_next'] = df.groupby('userID')['timestamp'].shift(-1) - df['timestamp']
这最终会给你你想要的:

userID           timestamp other_data  time_since_previous  time_until_next
0       1 2017-06-19 17:14:00        foo                  NaT         02:02:00
1       1 2017-06-19 19:16:00        bar             02:02:00         04:10:00
2       1 2017-06-19 23:26:00        ter             04:10:00         01:50:00
3       1 2017-06-20 01:16:00        lol             01:50:00              NaT
4       2 2017-06-20 12:00:00        ter                  NaT         01:15:00
5       2 2017-06-20 13:15:00        foo             01:15:00         04:00:00
6       2 2017-06-20 17:15:00        bar             04:00:00              NaT
你唯一要做的就是处理
NaT
s

userID           timestamp other_data  time_since_previous  time_until_next
0       1 2017-06-19 17:14:00        foo                  NaT         02:02:00
1       1 2017-06-19 19:16:00        bar             02:02:00         04:10:00
2       1 2017-06-19 23:26:00        ter             04:10:00         01:50:00
3       1 2017-06-20 01:16:00        lol             01:50:00              NaT
4       2 2017-06-20 12:00:00        ter                  NaT         01:15:00
5       2 2017-06-20 13:15:00        foo             01:15:00         04:00:00
6       2 2017-06-20 17:15:00        bar             04:00:00              NaT