Python 计算数据帧的时差
我有一个熊猫数据框,其中索引如下:Python 计算数据帧的时差,python,datetime,time,pandas,Python,Datetime,Time,Pandas,我有一个熊猫数据框,其中索引如下: Index([16/May/2013:23:56:43, 16/May/2013:23:56:42, 16/May/2013:23:56:43, ..., 17/May/2013:23:54:45, 17/May/2013:23:54:45, 17/May/2013:23:54:45], dtype=object) 我用以下方法计算了后续事件的时差 df2['tvalue'] = df2.index df2['tvalue'] = np.datetime64
Index([16/May/2013:23:56:43, 16/May/2013:23:56:42, 16/May/2013:23:56:43, ..., 17/May/2013:23:54:45, 17/May/2013:23:54:45, 17/May/2013:23:54:45], dtype=object)
我用以下方法计算了后续事件的时差
df2['tvalue'] = df2.index
df2['tvalue'] = np.datetime64(df2['tvalue'])
df2['delta'] = (df2['tvalue']-df2['tvalue'].shift()).fillna(0)
所以我得到了以下输出
Time tvalue delta
16/May/2013:23:56:43 2013-05-01 13:23:56 00:00:00
16/May/2013:23:56:42 2013-05-01 13:23:56 00:00:00
16/May/2013:23:56:43 2013-05-01 13:23:56 00:00:00
16/May/2013:23:56:43 2013-05-01 13:23:56 00:00:00
16/May/2013:23:56:48 2013-05-01 13:23:56 00:00:00
16/May/2013:23:56:48 2013-05-01 13:23:56 00:00:00
16/May/2013:23:56:48 2013-05-01 13:23:56 00:00:00
16/May/2013:23:57:44 2013-05-01 13:23:57 00:00:01
16/May/2013:23:57:44 2013-05-01 13:23:57 00:00:00
16/May/2013:23:57:44 2013-05-01 13:23:57 00:00:00
但是它计算了以小时为单位的时间差,并且日期也不同?这里会有什么问题?解析您的日期非常重要,我认为strtime可以做到,但对我来说不起作用。上面的示例时间只是字符串,而不是日期时间
In [140]: from dateutil import parser
In [130]: def parse(x):
.....: date, hh, mm, ss = x.split(':')
.....: dd, mo, yyyy = date.split('/')
.....: return parser.parse("%s %s %s %s:%s:%s" % (yyyy,mo,dd,hh,mm,ss))
.....:
In [131]: map(parse,idx)
Out[131]:
[datetime.datetime(2013, 5, 16, 23, 56, 43),
datetime.datetime(2013, 5, 16, 23, 56, 42),
datetime.datetime(2013, 5, 16, 23, 56, 43),
datetime.datetime(2013, 5, 17, 23, 54, 45),
datetime.datetime(2013, 5, 17, 23, 54, 45),
datetime.datetime(2013, 5, 17, 23, 54, 45)]
In [132]: pd.to_datetime(map(parse,idx))
Out[132]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-16 23:56:43, ..., 2013-05-17 23:54:45]
Length: 6, Freq: None, Timezone: None
In [133]: df = DataFrame(dict(time = pd.to_datetime(map(parse,idx))))
In [134]: df
Out[134]:
time
0 2013-05-16 23:56:43
1 2013-05-16 23:56:42
2 2013-05-16 23:56:43
3 2013-05-17 23:54:45
4 2013-05-17 23:54:45
5 2013-05-17 23:54:45
In [138]: df['delta'] = (df['time']-df['time'].shift()).fillna(0)
In [139]: df
Out[139]:
time delta
0 2013-05-16 23:56:43 00:00:00
1 2013-05-16 23:56:42 -00:00:01
2 2013-05-16 23:56:43 00:00:01
3 2013-05-17 23:54:45 23:58:02
4 2013-05-17 23:54:45 00:00:00
5 2013-05-17 23:54:45 00:00:00
[140]中的:来自dateutil导入解析器
在[130]中:def parse(x):
..:日期,hh,mm,ss=x.split(“:”)
..:dd,mo,yyyy=日期.拆分('/'))
..:返回parser.parse(“%s%s%s%s:%s:%s”%(yyyy、mo、dd、hh、mm、ss))
.....:
In[131]:映射(parse,idx)
出[131]:
[datetime.datetime(2013,5,16,23,56,43),
datetime.datetime(2013,5,16,23,56,42),
datetime.datetime(2013,5,16,23,56,43),
datetime.datetime(2013,5,17,23,54,45),
datetime.datetime(2013,5,17,23,54,45),
datetime.datetime(2013,5,17,23,54,45)]
In[132]:pd.to_datetime(map(parse,idx))
出[132]:
[2013-05-16 23:56:43, ..., 2013-05-17 23:54:45]
长度:6,频率:无,时区:无
在[133]中:df=DataFrame(dict(time=pd.to_datetime(map(parse,idx)))
In[134]:df
出[134]:
时间
0 2013-05-16 23:56:43
1 2013-05-16 23:56:42
2 2013-05-16 23:56:43
3 2013-05-17 23:54:45
4 2013-05-17 23:54:45
5 2013-05-17 23:54:45
在[138]中:df['delta']=(df['time']-df['time'].shift()).fillna(0)
In[139]:df
出[139]:
时间增量
0 2013-05-16 23:56:43 00:00:00
1 2013-05-16 23:56:42 -00:00:01
2 2013-05-16 23:56:43 00:00:01
3 2013-05-17 23:54:45 23:58:02
4 2013-05-17 23:54:45 00:00:00
5 2013-05-17 23:54:45 00:00:00
也可以使用df.diff()
代替df['time']-df['time'].shift()
。稍微干净一点。@Jeff:工作得很好!:)但是处理代码需要时间!