python可以快速转换为datetime,150万行

python可以快速转换为datetime,150万行,python,pandas,Python,Pandas,嗨,我有一个大数据帧,150万行,请参见下面的快照。我想通过组合date和time列(两者都是字符串)来创建一个date-time列,当我在解析后一起创建一列数据时间时,它需要花费我的时间,因为操作是在每一行上单独执行的 Date Time Open Symbol VOD 02/25/2013 0

嗨,我有一个大数据帧,150万行,请参见下面的快照。我想通过组合date和time列(两者都是字符串)来创建一个date-time列,当我在解析后一起创建一列数据时间时,它需要花费我的时间,因为操作是在每一行上单独执行的

              Date          Time  Open 
Symbol                                                                         
VOD   02/25/2013  00:00:00.000     0 
VOD   02/25/2013  00:01:00.000     0 
VOD   02/25/2013  00:02:00.000     0 
VOD   02/25/2013  00:03:00.000     0 
VOD   02/25/2013  00:04:00.000     0 
VOD   02/25/2013  00:05:00.000     0 
VOD   02/25/2013  00:06:00.000     0 
我正在使用以下代码创建一个列

aa=[datetime.strptime(str(df.DateMap[i])+' '+df.Time[i], '%m/%d/%Y %H:%M:%S.%f') for i in range(len(df))]
由于有150万行,因此此函数需要很长时间才能完成。有什么想法吗

In [1]: data = """VOD   02/25/2013  00:00:00.000     0 
VOD   02/25/2013  00:01:00.000     0 
VOD   02/25/2013  00:02:00.000     0 
VOD   02/25/2013  00:03:00.000     0 
VOD   02/25/2013  00:04:00.000     0 
VOD   02/25/2013  00:05:00.000     0 
VOD   02/25/2013  00:06:00.000     0 """

In [2]: df = pd.read_csv(StringIO(data),sep='\s+',names=['ticker','date','time','value'])

In [3]: df2 = pd.concat([df]*100000*2)

In [4]: df2.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1400000 entries, 0 to 6
Data columns (total 4 columns):
ticker    1400000 non-null object
date      1400000 non-null object
time      1400000 non-null object
value     1400000 non-null int64
dtypes: int64(1), object(3)
memory usage: 53.4+ MB

In [5]: result1 = pd.to_datetime(df2['date'] + ' ' + df2['time'],format='%m/%d/%Y %H:%M:%S.%f')

In [6]: result2 = pd.to_datetime(df2['date'], format="%m/%d/%Y") +  pd.to_timedelta(df2['time'])
result1
In [7]: result1.equals(result2)
Out[7]: True

In [9]: result1.head()
Out[9]: 
0   2013-02-25 00:00:00
1   2013-02-25 00:01:00
2   2013-02-25 00:02:00
3   2013-02-25 00:03:00
4   2013-02-25 00:04:00
dtype: datetime64[ns]
In [5]: %timeit pd.to_datetime(df2['date'], format="%m/%d/%Y") +  pd.to_timedelta(df2['time'])
1 loops, best of 3: 9.76 s per loop

In [4]: %timeit pd.to_datetime(df2['date'] + ' ' + df2['time'],format='%m/%d/%Y %H:%M:%S.%f')
1 loops, best of 3: 8.81 s per loop