Python 时间和内存lambda应用
我有一个下面的数据框,我想连接两列,一列包含日期,另一列包含小时。 对于33672行数据帧,下面的代码需要5秒,这是因为我有1000倍的数据 有人有更有效的方法吗Python 时间和内存lambda应用,python,pandas,Python,Pandas,我有一个下面的数据框,我想连接两列,一列包含日期,另一列包含小时。 对于33672行数据帧,下面的代码需要5秒,这是因为我有1000倍的数据 有人有更有效的方法吗 >>> tt DATE level_2 VALUE SCENARIO s0000 2014-02-28 0 36.39 s0000 2014-02-28 1 34.17 s0000
>>> tt
DATE level_2 VALUE
SCENARIO
s0000 2014-02-28 0 36.39
s0000 2014-02-28 1 34.17
s0000 2014-02-28 2 32.95
s0000 2014-02-28 3 32.84
s0000 2014-02-28 4 34.36
s0000 2014-02-28 5 36.32
s0000 2014-02-28 6 39.76
s0000 2014-02-28 7 40.66
s0000 2014-02-28 8 46.21
s0000 2014-02-28 9 47.19
s0000 2014-02-28 10 46.48
s0000 2014-02-28 11 46.84
s0000 2014-02-28 12 46.08
... ... ...
[33672 rows x 3 columns]
>>> timet = time.time()
>>> tt['DATES'] = tt.apply(lambda row: row['DATE'].replace(hour=row['level_2']), axis=1)
print time.time()-timet
4.76399993896
仅当无法矢量化时,“应用”才有用 将在>=0.12(在0.14中,您可以使用
pd.to_timedelta(df['hour'],unit='h')
而不是aType
In [8]: df = DataFrame(dict(date = Timestamp('20140228'), hour = np.random.randint(0,50,size=1000000)))
In [9]: df.shape
Out[9]: (1000000, 2)
In [10]: %timeit df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')
1 loops, best of 3: 255 ms per loop
In [11]: (df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')).head()
Out[11]:
0 2014-03-01 03:00:00
1 2014-02-28 23:00:00
2 2014-03-01 06:00:00
3 2014-03-01 06:00:00
4 2014-02-28 15:00:00
dtype: datetime64[ns]
仅当无法矢量化时,“应用”才有用 将在>=0.12(在0.14中,您可以使用
pd.to_timedelta(df['hour'],unit='h')
而不是aType
In [8]: df = DataFrame(dict(date = Timestamp('20140228'), hour = np.random.randint(0,50,size=1000000)))
In [9]: df.shape
Out[9]: (1000000, 2)
In [10]: %timeit df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')
1 loops, best of 3: 255 ms per loop
In [11]: (df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')).head()
Out[11]:
0 2014-03-01 03:00:00
1 2014-02-28 23:00:00
2 2014-03-01 06:00:00
3 2014-03-01 06:00:00
4 2014-02-28 15:00:00
dtype: datetime64[ns]
仅当无法矢量化时,“应用”才有用 将在>=0.12(在0.14中,您可以使用
pd.to_timedelta(df['hour'],unit='h')
而不是aType
In [8]: df = DataFrame(dict(date = Timestamp('20140228'), hour = np.random.randint(0,50,size=1000000)))
In [9]: df.shape
Out[9]: (1000000, 2)
In [10]: %timeit df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')
1 loops, best of 3: 255 ms per loop
In [11]: (df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')).head()
Out[11]:
0 2014-03-01 03:00:00
1 2014-02-28 23:00:00
2 2014-03-01 06:00:00
3 2014-03-01 06:00:00
4 2014-02-28 15:00:00
dtype: datetime64[ns]
仅当无法矢量化时,“应用”才有用 将在>=0.12(在0.14中,您可以使用
pd.to_timedelta(df['hour'],unit='h')
而不是aType
In [8]: df = DataFrame(dict(date = Timestamp('20140228'), hour = np.random.randint(0,50,size=1000000)))
In [9]: df.shape
Out[9]: (1000000, 2)
In [10]: %timeit df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')
1 loops, best of 3: 255 ms per loop
In [11]: (df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')).head()
Out[11]:
0 2014-03-01 03:00:00
1 2014-02-28 23:00:00
2 2014-03-01 06:00:00
3 2014-03-01 06:00:00
4 2014-02-28 15:00:00
dtype: datetime64[ns]