Python 时间和内存lambda应用

Python 时间和内存lambda应用,python,pandas,Python,Pandas,我有一个下面的数据框,我想连接两列,一列包含日期,另一列包含小时。 对于33672行数据帧,下面的代码需要5秒,这是因为我有1000倍的数据 有人有更有效的方法吗 >>> tt DATE level_2 VALUE SCENARIO s0000 2014-02-28 0 36.39 s0000 2014-02-28 1 34.17 s0000

我有一个下面的数据框,我想连接两列,一列包含日期,另一列包含小时。 对于33672行数据帧,下面的代码需要5秒,这是因为我有1000倍的数据

有人有更有效的方法吗

>>> tt
               DATE  level_2  VALUE
SCENARIO                           
s0000    2014-02-28        0  36.39
s0000    2014-02-28        1  34.17
s0000    2014-02-28        2  32.95
s0000    2014-02-28        3  32.84
s0000    2014-02-28        4  34.36
s0000    2014-02-28        5  36.32
s0000    2014-02-28        6  39.76
s0000    2014-02-28        7  40.66
s0000    2014-02-28        8  46.21
s0000    2014-02-28        9  47.19
s0000    2014-02-28       10  46.48
s0000    2014-02-28       11  46.84
s0000    2014-02-28       12  46.08
            ...      ...    ...

[33672 rows x 3 columns]

>>> timet = time.time()
>>> tt['DATES'] = tt.apply(lambda row: row['DATE'].replace(hour=row['level_2']), axis=1)
print time.time()-timet
4.76399993896

仅当无法矢量化时,“应用”才有用

将在>=0.12(在0.14中,您可以使用
pd.to_timedelta(df['hour'],unit='h')
而不是aType

In [8]: df = DataFrame(dict(date = Timestamp('20140228'), hour = np.random.randint(0,50,size=1000000)))

In [9]: df.shape
Out[9]: (1000000, 2)

In [10]: %timeit df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')
1 loops, best of 3: 255 ms per loop

In [11]: (df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')).head()
Out[11]: 
0   2014-03-01 03:00:00
1   2014-02-28 23:00:00
2   2014-03-01 06:00:00
3   2014-03-01 06:00:00
4   2014-02-28 15:00:00
dtype: datetime64[ns]

仅当无法矢量化时,“应用”才有用

将在>=0.12(在0.14中,您可以使用
pd.to_timedelta(df['hour'],unit='h')
而不是aType

In [8]: df = DataFrame(dict(date = Timestamp('20140228'), hour = np.random.randint(0,50,size=1000000)))

In [9]: df.shape
Out[9]: (1000000, 2)

In [10]: %timeit df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')
1 loops, best of 3: 255 ms per loop

In [11]: (df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')).head()
Out[11]: 
0   2014-03-01 03:00:00
1   2014-02-28 23:00:00
2   2014-03-01 06:00:00
3   2014-03-01 06:00:00
4   2014-02-28 15:00:00
dtype: datetime64[ns]

仅当无法矢量化时,“应用”才有用

将在>=0.12(在0.14中,您可以使用
pd.to_timedelta(df['hour'],unit='h')
而不是aType

In [8]: df = DataFrame(dict(date = Timestamp('20140228'), hour = np.random.randint(0,50,size=1000000)))

In [9]: df.shape
Out[9]: (1000000, 2)

In [10]: %timeit df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')
1 loops, best of 3: 255 ms per loop

In [11]: (df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')).head()
Out[11]: 
0   2014-03-01 03:00:00
1   2014-02-28 23:00:00
2   2014-03-01 06:00:00
3   2014-03-01 06:00:00
4   2014-02-28 15:00:00
dtype: datetime64[ns]

仅当无法矢量化时,“应用”才有用

将在>=0.12(在0.14中,您可以使用
pd.to_timedelta(df['hour'],unit='h')
而不是aType

In [8]: df = DataFrame(dict(date = Timestamp('20140228'), hour = np.random.randint(0,50,size=1000000)))

In [9]: df.shape
Out[9]: (1000000, 2)

In [10]: %timeit df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')
1 loops, best of 3: 255 ms per loop

In [11]: (df['date'] + df['hour'].astype('timedelta64[h]').astype('timedelta64[ns]')).head()
Out[11]: 
0   2014-03-01 03:00:00
1   2014-02-28 23:00:00
2   2014-03-01 06:00:00
3   2014-03-01 06:00:00
4   2014-02-28 15:00:00
dtype: datetime64[ns]