Python 如何加入2熊猫时间序列
我有一个价格数据帧(df1),如下所示:Python 如何加入2熊猫时间序列,python,pandas,Python,Pandas,我有一个价格数据帧(df1),如下所示: price 2007-01-01 00:00:00 0.789510 2007-01-01 04:00:00 0.789380 2007-01-01 20:00:00 0.789485 2007-01-02 01:00:00 0.791290 2007-01-02 02:00:00 0.791630 2007-01-02 16:00:00 0.793100 2007-01-02 17:00:00
price
2007-01-01 00:00:00 0.789510
2007-01-01 04:00:00 0.789380
2007-01-01 20:00:00 0.789485
2007-01-02 01:00:00 0.791290
2007-01-02 02:00:00 0.791630
2007-01-02 16:00:00 0.793100
2007-01-02 17:00:00 0.793605
2007-01-03 18:00:00 0.780640
2007-01-03 19:00:00 0.780005
2007-01-03 20:00:00 0.779410
2007-01-01 15:00:00 0.7882
2007-01-02 15:00:00 0.7962
2007-01-03 15:00:00 0.7909
2007-01-04 15:00:00 0.7862
2007-01-05 15:00:00 0.7787
2007-01-08 15:00:00 0.7812
2007-01-09 15:00:00 0.7800
2007-01-10 15:00:00 0.7769
一系列收盘价(s1)如下:
price
2007-01-01 00:00:00 0.789510
2007-01-01 04:00:00 0.789380
2007-01-01 20:00:00 0.789485
2007-01-02 01:00:00 0.791290
2007-01-02 02:00:00 0.791630
2007-01-02 16:00:00 0.793100
2007-01-02 17:00:00 0.793605
2007-01-03 18:00:00 0.780640
2007-01-03 19:00:00 0.780005
2007-01-03 20:00:00 0.779410
2007-01-01 15:00:00 0.7882
2007-01-02 15:00:00 0.7962
2007-01-03 15:00:00 0.7909
2007-01-04 15:00:00 0.7862
2007-01-05 15:00:00 0.7787
2007-01-08 15:00:00 0.7812
2007-01-09 15:00:00 0.7800
2007-01-10 15:00:00 0.7769
我想将收盘价从s1添加到df1,这样df1的指数就可以保持,对于df1中的每个日期时间戳,最新的收盘价就可以从s1添加
因此,生成的数据帧如下所示:
price closing_price
2007-01-01 00:00:00 0.789510 0.7882
2007-01-01 04:00:00 0.789380 0.7882
2007-01-01 20:00:00 0.789485 0.7962
2007-01-02 01:00:00 0.791290 0.7962
2007-01-02 02:00:00 0.791630 0.7962
2007-01-02 16:00:00 0.793100 0.7909
2007-01-02 17:00:00 0.793605 0.7909
2007-01-03 18:00:00 0.780640 0.7862
2007-01-03 19:00:00 0.780005 0.7862
2007-01-03 20:00:00 0.779410 0.7862
您需要沿行(axis=1)将收盘价连接到数据帧。然后,您需要填写远期收盘价。最后,过滤掉价格为空的行
s1 = pd.Series([0.7882, 0.7962, 0.7909, 0.7862, 0.7787, 0.7812, 0.7800, 0.7769],
index=pd.date_range('2007-01-01 15:00', periods=8, freq='B'), name='close')
df1 = pd.DataFrame({'price': {
pd.Timestamp('2007-01-01 00:00:00'): 0.789510,
pd.Timestamp('2007-01-01 04:00:00'): 0.789380,
pd.Timestamp('2007-01-01 20:00:00'): 0.789485,
pd.Timestamp('2007-01-02 01:00:00'): 0.791290,
pd.Timestamp('2007-01-02 02:00:00'): 0.791630,
pd.Timestamp('2007-01-02 16:00:00'): 0.793100,
pd.Timestamp('2007-01-02 17:00:00'): 0.793605,
pd.Timestamp('2007-01-03 18:00:00'): 0.780640,
pd.Timestamp('2007-01-03 19:00:00'): 0.780005,
pd.Timestamp('2007-01-03 20:00:00'): 0.779410}})
df = pd.concat([df1, s1], axis=1)
df.close.ffill(inplace=True)
df = df[~df.price.isnull()]
>>> df
price close
2007-01-01 00:00:00 0.789510 NaN
2007-01-01 04:00:00 0.789380 NaN
2007-01-01 20:00:00 0.789485 0.7882
2007-01-02 01:00:00 0.791290 0.7882
2007-01-02 02:00:00 0.791630 0.7882
2007-01-02 16:00:00 0.793100 0.7962
2007-01-02 17:00:00 0.793605 0.7962
2007-01-03 18:00:00 0.780640 0.7909
2007-01-03 19:00:00 0.780005 0.7909
2007-01-03 20:00:00 0.779410 0.7909
您需要沿行(axis=1)将收盘价连接到数据帧。然后,您需要填写远期收盘价。最后,过滤掉价格为空的行
s1 = pd.Series([0.7882, 0.7962, 0.7909, 0.7862, 0.7787, 0.7812, 0.7800, 0.7769],
index=pd.date_range('2007-01-01 15:00', periods=8, freq='B'), name='close')
df1 = pd.DataFrame({'price': {
pd.Timestamp('2007-01-01 00:00:00'): 0.789510,
pd.Timestamp('2007-01-01 04:00:00'): 0.789380,
pd.Timestamp('2007-01-01 20:00:00'): 0.789485,
pd.Timestamp('2007-01-02 01:00:00'): 0.791290,
pd.Timestamp('2007-01-02 02:00:00'): 0.791630,
pd.Timestamp('2007-01-02 16:00:00'): 0.793100,
pd.Timestamp('2007-01-02 17:00:00'): 0.793605,
pd.Timestamp('2007-01-03 18:00:00'): 0.780640,
pd.Timestamp('2007-01-03 19:00:00'): 0.780005,
pd.Timestamp('2007-01-03 20:00:00'): 0.779410}})
df = pd.concat([df1, s1], axis=1)
df.close.ffill(inplace=True)
df = df[~df.price.isnull()]
>>> df
price close
2007-01-01 00:00:00 0.789510 NaN
2007-01-01 04:00:00 0.789380 NaN
2007-01-01 20:00:00 0.789485 0.7882
2007-01-02 01:00:00 0.791290 0.7882
2007-01-02 02:00:00 0.791630 0.7882
2007-01-02 16:00:00 0.793100 0.7962
2007-01-02 17:00:00 0.793605 0.7962
2007-01-03 18:00:00 0.780640 0.7909
2007-01-03 19:00:00 0.780005 0.7909
2007-01-03 20:00:00 0.779410 0.7909
这实际上不是一个“连接”问题,而是一个“重新索引”问题。pandas支持这一点,并且可以在一行代码中完成这一点。见下文
df1['close'] = s1.reindex(df1.index, method='bfill')
这就产生了,
price close
2007-01-01 00:00:00 0.789510 0.7882
2007-01-01 04:00:00 0.789380 0.7882
2007-01-01 20:00:00 0.789485 0.7962
2007-01-02 01:00:00 0.791290 0.7962
2007-01-02 02:00:00 0.791630 0.7962
2007-01-02 16:00:00 0.793100 0.7909
2007-01-02 17:00:00 0.793605 0.7909
2007-01-03 18:00:00 0.780640 0.7862
2007-01-03 19:00:00 0.780005 0.7862
2007-01-03 20:00:00 0.779410 0.7862
这实际上不是一个“连接”问题,而是一个“重新索引”问题。pandas支持这一点,并且可以在一行代码中完成这一点。见下文
df1['close'] = s1.reindex(df1.index, method='bfill')
这就产生了,
price close
2007-01-01 00:00:00 0.789510 0.7882
2007-01-01 04:00:00 0.789380 0.7882
2007-01-01 20:00:00 0.789485 0.7962
2007-01-02 01:00:00 0.791290 0.7962
2007-01-02 02:00:00 0.791630 0.7962
2007-01-02 16:00:00 0.793100 0.7909
2007-01-02 17:00:00 0.793605 0.7909
2007-01-03 18:00:00 0.780640 0.7862
2007-01-03 19:00:00 0.780005 0.7862
2007-01-03 20:00:00 0.779410 0.7862
作为pd进口大熊猫;pd.concat([df1,s1]);这就是你要找的吗?我想这只会把这个系列添加到dataframeAh的末尾,我对这个问题有了更好的理解;pd.concat([df1,s1]);这就是你想要的吗?我想那只会把这个系列添加到dataframeAh的末尾,我更了解这个问题。感谢Alexander,不幸的是,我真正的df1在连接之前包含了很多NaN…还有很多15:00:00的日期时间,所以我不认为这个解决方案会像我所希望的那样工作,不幸的是,我真正的df1在连接之前包含了很多NAN…还有很多15:00:00的日期时间,所以我不认为这个解决方案会像我所希望的那样工作