Pandas 熊猫:将日期字段添加到解析的时间戳

Pandas 熊猫:将日期字段添加到解析的时间戳,pandas,Pandas,我有几个特定于日期的文本文件(例如20150211.txt),看起来像 TopOfBook 0x21 60 07:15:00.862 101 85 5 109 500 24 + TopOfBook 0x21 60 07:15:00.882 101 91 400 109 500 18 + TopOfBook 0x

我有几个特定于日期的文本文件(例如20150211.txt),看起来像

TopOfBook       0x21    60      07:15:00.862    101     85      5       109     500     24      +
TopOfBook       0x21    60      07:15:00.882    101     91      400     109     500     18      +
TopOfBook       0x21    60      07:15:00.890    101     91      400     105     80      14      +
TopOfBook       0x21    60      07:15:00.914    101     93.3    400     105     80      11.7    +
其中第4列包含时间戳

如果我把它读入自动解析的熊猫

df_top = pd.read_csv('TOP_20150210.txt', sep='\t', names=hdr_top, parse_dates=[3])
我得到:

0   TopOfBook   0x21    60  2015-05-17 07:15:00.862000  101 85.0    5   109.0   500 24.0    +
1   TopOfBook   0x21    60  2015-05-17 07:15:00.882000  101 91.0    400 109.0   500 18.0    +
2   TopOfBook   0x21    60  2015-05-17 07:15:00.890000  101 91.0    400 105.0   80  14.0    +

当然,时间部分是正确的,但如何添加此时间戳(2015-02-11)的正确日期部分?谢谢

您可以
应用
并使用所需的日期值构造日期时间,然后将时间部分复制到构造函数:

In [9]:
import datetime as dt
df[3] = df[3].apply(lambda x: dt.datetime(2015,2,11,x.hour,x.minute,x.second,x.microsecond))
df
Out[9]:
          0     1   2                          3    4     5    6    7    8   \
0  TopOfBook  0x21  60 2015-02-11 07:15:00.862000  101  85.0    5  109  500   
1  TopOfBook  0x21  60 2015-02-11 07:15:00.882000  101  91.0  400  109  500   
2  TopOfBook  0x21  60 2015-02-11 07:15:00.890000  101  91.0  400  105   80   
3  TopOfBook  0x21  60 2015-02-11 07:15:00.914000  101  93.3  400  105   80   

     9  10  
0  24.0  +  
1  18.0  +  
2  14.0  +  
3  11.7  +  

解析日期后,第三列具有dtype

In [139]: df[3] - np.array([6], dtype='<m8[D]')
Out[139]: 
0   2015-05-11 07:15:00.862000
1   2015-05-11 07:15:00.882000
2   2015-05-11 07:15:00.890000
3   2015-05-11 07:15:00.914000
Name: 3, dtype: datetime64[ns]
today = df.iloc[0,3]
date = pd.Timestamp(re.search(r'\d+', filename).group())
n = (today-date).days
import datetime as DT
import numpy as np
import pandas as pd
import re

filename = '20150211.txt'
df = pd.read_csv(filename, sep='\t', header=None, parse_dates=[3])
today = df.iloc[0,3]
date = pd.Timestamp(re.search(r'\d+', filename).group())
n = (today-date).days
df[3] -= np.array([n], dtype='<m8[D]')
print(df)
           0     1   2                          3    4     5    6    7    8  \
0  TopOfBook  0x21  60 2015-02-11 07:15:00.862000  101  85.0    5  109  500   
1  TopOfBook  0x21  60 2015-02-11 07:15:00.882000  101  91.0  400  109  500   
2  TopOfBook  0x21  60 2015-02-11 07:15:00.890000  101  91.0  400  105   80   
3  TopOfBook  0x21  60 2015-02-11 07:15:00.914000  101  93.3  400  105   80   

      9  
0  24.0  
1  18.0  
2  14.0  
3  11.7