Python 在HDF中追加多索引数据帧
以日终股票数据为例:Python 在HDF中追加多索引数据帧,python,pandas,hdf5,Python,Pandas,Hdf5,以日终股票数据为例: In [36]: df Out[36]: Code Name High Low Close Volume Change Change.2 0 AAAU Perth Mint Physical Gold ETF 16.8500 16.3900 16.6900 311400 0.0000 0.02 1 AADR Advisors
In [36]: df
Out[36]:
Code Name High Low Close Volume Change Change.2
0 AAAU Perth Mint Physical Gold ETF 16.8500 16.3900 16.6900 311400 0.0000 0.02
1 AADR Advisorshares Dorsey Wright ADR 49.8400 49.2300 49.6100 18500 -1.3000 2.54
2 AAMC Altisource Asset 24.0000 20.0000 23.9400 2500 0.3600 1.53
3 AAU Almaden Minerals 0.3987 0.3650 0.3684 355100 -0.0147 3.84
4 ABEQ Absolute Core Strategy ETF 23.2100 22.8200 23.1100 114700 -0.1900 0.82
... ... ... ... ... ... ... ... ...
26643 ZVLO Esoft Inc 0.0600 0.0600 0.0600 1000 0.0100 20
26644 ZVTK Zevotek Inc 0.0313 0.0209 0.0302 44900 0.0102 51
26645 ZXAIY China Zenix Auto International 0.1534 0.1534 0.1534 200 -0.1566 50.52
26646 ZYRX Zyrox Mining Intl Inc 0.0200 0.0181 0.0200 3000 0.0000 0
26647 ZZZOF Zinc One Resources Inc 0.0111 0.0111 0.0111 300 0.0000 0
附加问题:
In [36]: df
Out[36]:
Code Name High Low Close Volume Change Change.2
0 AAAU Perth Mint Physical Gold ETF 16.8500 16.3900 16.6900 311400 0.0000 0.02
1 AADR Advisorshares Dorsey Wright ADR 49.8400 49.2300 49.6100 18500 -1.3000 2.54
2 AAMC Altisource Asset 24.0000 20.0000 23.9400 2500 0.3600 1.53
3 AAU Almaden Minerals 0.3987 0.3650 0.3684 355100 -0.0147 3.84
4 ABEQ Absolute Core Strategy ETF 23.2100 22.8200 23.1100 114700 -0.1900 0.82
... ... ... ... ... ... ... ... ...
26643 ZVLO Esoft Inc 0.0600 0.0600 0.0600 1000 0.0100 20
26644 ZVTK Zevotek Inc 0.0313 0.0209 0.0302 44900 0.0102 51
26645 ZXAIY China Zenix Auto International 0.1534 0.1534 0.1534 200 -0.1566 50.52
26646 ZYRX Zyrox Mining Intl Inc 0.0200 0.0181 0.0200 3000 0.0000 0
26647 ZZZOF Zinc One Resources Inc 0.0111 0.0111 0.0111 300 0.0000 0
有几种不同的方法将此类数据存储到HDF5
In [36]: df
Out[36]:
Code Name High Low Close Volume Change Change.2
0 AAAU Perth Mint Physical Gold ETF 16.8500 16.3900 16.6900 311400 0.0000 0.02
1 AADR Advisorshares Dorsey Wright ADR 49.8400 49.2300 49.6100 18500 -1.3000 2.54
2 AAMC Altisource Asset 24.0000 20.0000 23.9400 2500 0.3600 1.53
3 AAU Almaden Minerals 0.3987 0.3650 0.3684 355100 -0.0147 3.84
4 ABEQ Absolute Core Strategy ETF 23.2100 22.8200 23.1100 114700 -0.1900 0.82
... ... ... ... ... ... ... ... ...
26643 ZVLO Esoft Inc 0.0600 0.0600 0.0600 1000 0.0100 20
26644 ZVTK Zevotek Inc 0.0313 0.0209 0.0302 44900 0.0102 51
26645 ZXAIY China Zenix Auto International 0.1534 0.1534 0.1534 200 -0.1566 50.52
26646 ZYRX Zyrox Mining Intl Inc 0.0200 0.0181 0.0200 3000 0.0000 0
26647 ZZZOF Zinc One Resources Inc 0.0111 0.0111 0.0111 300 0.0000 0
我使用此代码在每个新的一天附加分层数据帧:
df = pd.concat(lod, ignore_index=True)
# remove not useful dataj
df = df.drop(['Change.1', 'Change.2', 'Unnamed: 9'], axis=1)
df = df.dropna()
# append a Date column
df['Date'] = dt.datetime.today().date() - dt.timedelta(days=1)
# create multiindex
df = df.set_index(['Date', 'Code', 'Name'])
# append the data to hdf5 container
df.to_hdf(wkd + 'Database.h5', key='stocks', mode='a', format='table')
表被替换而不是展开。怎么了?我的主要问题的答案很简单: 我喜欢这里: 只需添加“append=True”
df.to_hdf(wkd + 'Database.h5', key='stocks', mode='a', format='table', append = True)
编辑:
我目前对补充问题的答复是:
我认为使用第三种方法是可以的,因为使用磁盘上的pandas HDFStore对象查询多索引数据帧很容易:
store.select('stocks', "Code=BMWYY")
要添加像comany fundamentals这样的新数据,我只需向HDF文件添加一个新的表对象。然后我查询这两个表,并使用pandas进行进一步分析