Python 在HDFStore中插入数据帧作为数据集

Python 在HDFStore中插入数据帧作为数据集,python,pandas,hdf5,hdfstore,Python,Pandas,Hdf5,Hdfstore,我遇到了一个关于pandas HDFStore方法的问题,在这个问题上,我无法以使用h5py.File方法检索数据的方式访问数据。以下是代码片段: In [1]: import pandas as pd In [2]: import numpy as np In [3]: import h5py as h5 In [4]: hdf = pd.HDFStore("tmp.h5") In [5]: hdf.put('tables/t1', pd.DataFrame(np.random

我遇到了一个关于pandas HDFStore方法的问题,在这个问题上,我无法以使用h5py.File方法检索数据的方式访问数据。以下是代码片段:

In [1]: import pandas as pd  

In [2]: import numpy as np  

In [3]: import h5py as h5

In [4]: hdf = pd.HDFStore("tmp.h5")

In [5]: hdf.put('tables/t1', pd.DataFrame(np.random.rand(20,5)))

In [6]: hdf.put('t2', pd.DataFrame(np.random.rand(10,5)))

In [7]: 

In [7]: hdf.close() 

In [8]: 

In [8]: ############ Read using pd.HDFStore ############

In [9]: 

In [9]: data = pd.HDFStore ("tmp.h5") 

In [10]: data["tables/t1"] 
Out[10]: 
           0         1         2         3         4
0   0.384926  0.712066  0.022438  0.686217  0.942678
1   0.079548  0.466799  0.575394  0.276646  0.514414
2   0.672582  0.828567  0.801799  0.296046  0.124042
3   0.568058  0.931348  0.225348  0.547913  0.736184
4   0.496768  0.419699  0.724118  0.313427  0.353825
5   0.771868  0.963346  0.523821  0.793295  0.052085
6   0.358478  0.845149  0.334389  0.674448  0.239096
7   0.454559  0.604438  0.183654  0.027641  0.186922
8   0.776586  0.155783  0.253801  0.123986  0.560601
9   0.201239  0.932080  0.040997  0.119049  0.154076
10  0.753566  0.770133  0.123285  0.112419  0.353622
11  0.040959  0.384800  0.806119  0.247106  0.013442
12  0.739205  0.100547  0.855418  0.774874  0.710557
13  0.865856  0.565094  0.815860  0.816869  0.834415
14  0.251312  0.624995  0.976317  0.854855  0.744861
15  0.179678  0.435902  0.602303  0.118516  0.386935
16  0.452009  0.973729  0.067736  0.097811  0.292619
17  0.285994  0.569845  0.584602  0.001671  0.422877
18  0.727996  0.291086  0.736912  0.960595  0.132891
19  0.356397  0.747693  0.458485  0.100849  0.072220

In [11]: ## Success 

In [12]: data ["tables"]["t1"] 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-c7599d16a7b6> in <module>()
----> 1 data ["tables"]["t1"]

/usr/conda/lib/python2.7/site-packages/pandas/io/pytables.py in __getitem__(self, key)
    415 
    416     def __getitem__(self, key):
--> 417         return self.get(key)
    418 
    419     def __setitem__(self, key, value):

/usr/conda/lib/python2.7/site-packages/pandas/io/pytables.py in get(self, key)
    632         if group is None:
    633             raise KeyError('No object named %s in the file' % key)
--> 634         return self._read_group(group)
    635 
    636     def select(self, key, where=None, start=None, stop=None, columns=None,

/usr/conda/lib/python2.7/site-packages/pandas/io/pytables.py in _read_group(self, group, **kwargs)
   1268 
   1269     def _read_group(self, group, **kwargs):
-> 1270         s = self._create_storer(group)
   1271         s.infer_axes()
   1272         return s.read(**kwargs)

/usr/conda/lib/python2.7/site-packages/pandas/io/pytables.py in _create_storer(self, group, format, value, append, **kwargs)
   1151                 else:
   1152                     raise TypeError(
-> 1153                         "cannot create a storer if the object is not existing "
   1154                         "nor a value are passed")
   1155             else:

TypeError: cannot create a storer if the object is not existing nor a value are passed

In [13]: 

In [13]: data.close() 

In [14]: 

In [14]: ########### Read using h5py.File ############## 

In [15]: 

In [15]: data = h5.File("tmp.h5","r") 

In [16]: 

In [16]: data["tables"]
Out[16]: <HDF5 group "/tables" (1 members)>

In [17]: 

In [17]: data["tables"]["t1"]
Out[17]: <HDF5 group "/tables/t1" (4 members)>

In [18]: 

In [18]: data['tables']['t1'].keys ()
Out[18]: [u'axis0', u'axis1', u'block0_items', u'block0_values']

In [19]: [u'axis0', u'axis1', u'block0_items', u'block0_values']
Out[19]: [u'axis0', u'axis1', u'block0_items', u'block0_values']

In [20]: 

In [20]: data['tables']['t1']['block0_values'].value
Out[20]: 
array([[ 0.38492571,  0.71206567,  0.02243773,  0.68621713,  0.9426783 ],
       [ 0.07954806,  0.4667994 ,  0.57539433,  0.27664603,  0.51441446],
       [ 0.67258161,  0.82856681,  0.80179916,  0.29604625,  0.12404214],
       [ 0.56805845,  0.93134797,  0.22534757,  0.54791294,  0.73618366],
       [ 0.49676792,  0.41969943,  0.72411835,  0.31342698,  0.35382463],
       [ 0.77186804,  0.96334586,  0.52382094,  0.7932945 ,  0.05208528],
       [ 0.3584784 ,  0.84514863,  0.33438851,  0.6744483 ,  0.23909552],
       [ 0.45455901,  0.6044383 ,  0.18365449,  0.02764097,  0.18692162],
       [ 0.77658631,  0.15578276,  0.25380109,  0.12398617,  0.56060138],
       [ 0.20123928,  0.93207974,  0.04099724,  0.11904895,  0.15407568],
       [ 0.75356644,  0.77013349,  0.12328475,  0.11241904,  0.35362213],
       [ 0.04095888,  0.38480023,  0.80611853,  0.24710571,  0.01344193],
       [ 0.73920528,  0.1005474 ,  0.85541761,  0.7748739 ,  0.71055697],
       [ 0.86585587,  0.5650938 ,  0.81586031,  0.81686915,  0.83441517],
       [ 0.25131205,  0.62499501,  0.97631707,  0.85485518,  0.74486096],
       [ 0.17967805,  0.43590236,  0.60230302,  0.11851596,  0.38693535],
       [ 0.4520091 ,  0.97372923,  0.0677363 ,  0.09781059,  0.29261929],
       [ 0.28599448,  0.56984462,  0.5846021 ,  0.00167063,  0.42287738],
       [ 0.72799625,  0.29108631,  0.7369122 ,  0.96059508,  0.13289119],
       [ 0.35639696,  0.7476934 ,  0.45848456,  0.10084881,  0.07221995]])

In [21]: 

In [21]: ######################## End ############### 

In [22]: 

In [22]: 
[1]中的
:将熊猫作为pd导入
在[2]中:将numpy作为np导入
在[3]中:将h5py作为h5导入
在[4]中:hdf=pd.HDFStore(“tmp.h5”)
[5]:hdf.put('tables/t1',pd.DataFrame(np.random.rand(20,5)))
在[6]中:hdf.put('t2',pd.DataFrame(np.random.rand(10,5)))
在[7]中:
在[7]中:hdf.close()
在[8]中:
在[8]中:使用pd.HDFStore阅读############
在[9]中:
在[9]中:data=pd.HDFStore(“tmp.h5”)
在[10]中:数据[“表/t1”]
出[10]:
0         1         2         3         4
0   0.384926  0.712066  0.022438  0.686217  0.942678
1   0.079548  0.466799  0.575394  0.276646  0.514414
2   0.672582  0.828567  0.801799  0.296046  0.124042
3   0.568058  0.931348  0.225348  0.547913  0.736184
4   0.496768  0.419699  0.724118  0.313427  0.353825
5   0.771868  0.963346  0.523821  0.793295  0.052085
6   0.358478  0.845149  0.334389  0.674448  0.239096
7   0.454559  0.604438  0.183654  0.027641  0.186922
8   0.776586  0.155783  0.253801  0.123986  0.560601
9   0.201239  0.932080  0.040997  0.119049  0.154076
10  0.753566  0.770133  0.123285  0.112419  0.353622
11  0.040959  0.384800  0.806119  0.247106  0.013442
12  0.739205  0.100547  0.855418  0.774874  0.710557
13  0.865856  0.565094  0.815860  0.816869  0.834415
14  0.251312  0.624995  0.976317  0.854855  0.744861
15  0.179678  0.435902  0.602303  0.118516  0.386935
16  0.452009  0.973729  0.067736  0.097811  0.292619
17  0.285994  0.569845  0.584602  0.001671  0.422877
18  0.727996  0.291086  0.736912  0.960595  0.132891
19  0.356397  0.747693  0.458485  0.100849  0.072220
在[11]中:###成功
在[12]中:数据[“表”][“t1”]
---------------------------------------------------------------------------
TypeError回溯(最近一次调用上次)
在()
---->1数据[“表”][“t1”]
/usr/conda/lib/python2.7/site-packages/pandas/io/pytables.py in_u__获取项目(self,key)
415
416定义获取项目(自身,密钥):
-->417返回self.get(键)
418
419定义设置项(自身、键、值):
/get(self,key)中的usr/conda/lib/python2.7/site-packages/pandas/io/pytables.py
632如果组为无:
633 raise KeyError('文件'%key'中没有名为%s的对象)
-->634返回自读组(组)
635
636 def select(self,key,其中=None,start=None,stop=None,columns=None,
/usr/conda/lib/python2.7/site-packages/pandas/io/pytables.py在_read_组中(self,group,**kwargs)
1268
1269定义读取组(自身、组、**kwargs):
->1270 s=自创建存储器(组)
1271 s.推断轴()
1272返回s.read(**kwargs)
/usr/conda/lib/python2.7/site-packages/pandas/io/pytables.py in_create_storer(self、group、format、value、append、**kwargs)
1151其他:
1152上升类型错误(
->1153“如果对象不存在,则无法创建存储程序”
1154“未传递任何值”)
1155其他:
TypeError:如果对象不存在或传递了值,则无法创建存储程序
在[13]中:
在[13]中:data.close()
在[14]中:
在[14]中:使用h5py.File阅读
在[15]中:
在[15]中:data=h5.File(“tmp.h5”、“r”)
在[16]中:
在[16]中:数据[“表”]
出[16]:
在[17]中:
在[17]中:数据[“表”][“t1”]
出[17]:
在[18]中:
在[18]中:数据['tables']['t1'].键()
输出[18]:[u'axis0',u'axis1',u'block0_项',u'block0_值']
在[19]:[u'axis0',u'axis1',u'block0_项',u'block0_值']
输出[19]:[u'axis0',u'axis1',u'block0_项',u'block0_值']
在[20]中:
在[20]中:数据['tables']['t1']['block0_值'].value
出[20]:
数组([[0.38492571,0.71206567,0.02243773,0.68621713,0.9426783],
[ 0.07954806,  0.4667994 ,  0.57539433,  0.27664603,  0.51441446],
[ 0.67258161,  0.82856681,  0.80179916,  0.29604625,  0.12404214],
[ 0.56805845,  0.93134797,  0.22534757,  0.54791294,  0.73618366],
[ 0.49676792,  0.41969943,  0.72411835,  0.31342698,  0.35382463],
[ 0.77186804,  0.96334586,  0.52382094,  0.7932945 ,  0.05208528],
[ 0.3584784 ,  0.84514863,  0.33438851,  0.6744483 ,  0.23909552],
[ 0.45455901,  0.6044383 ,  0.18365449,  0.02764097,  0.18692162],
[ 0.77658631,  0.15578276,  0.25380109,  0.12398617,  0.56060138],
[ 0.20123928,  0.93207974,  0.04099724,  0.11904895,  0.15407568],
[ 0.75356644,  0.77013349,  0.12328475,  0.11241904,  0.35362213],
[ 0.04095888,  0.38480023,  0.80611853,  0.24710571,  0.01344193],
[ 0.73920528,  0.1005474 ,  0.85541761,  0.7748739 ,  0.71055697],
[ 0.86585587,  0.5650938 ,  0.81586031,  0.81686915,  0.83441517],
[ 0.25131205,  0.62499501,  0.97631707,  0.85485518,  0.74486096],
[ 0.17967805,  0.43590236,  0.60230302,  0.11851596,  0.38693535],
[ 0.4520091 ,  0.97372923,  0.0677363 ,  0.09781059,  0.29261929],
[ 0.28599448,  0.56984462,  0.5846021 ,  0.00167063,  0.42287738],
[ 0.72799625,  0.29108631,  0.7369122 ,  0.96059508,  0.13289119],
[ 0.35639696,  0.7476934 ,  0.45848456,  0.10084881,  0.07221995]])
在[21]中:
(21)在摩摩摩摩摩切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切切###
在[22]中:
在[22]中:
我想用数据['tables']['t1']的方式访问数据。我被这个问题困住了。我观察到熊猫将hd5中的每个数据帧作为组插入。我想将其作为数据集插入,这样我就可以轻松地访问数据。

根据for
HDFStore

警告:对于存储在根节点下的项,无法按上述虚线(属性)访问方式检索分层键。请改用基于字符串的显式键

因此,模块加载HDF的方式不同。