Python HDFStore从嵌套列中选择
我有以下DataFrame,它作为称为data的frame_表存储在HDFStore对象中:Python HDFStore从嵌套列中选择,python,pandas,hdfstore,Python,Pandas,Hdfstore,我有以下DataFrame,它作为称为data的frame_表存储在HDFStore对象中: shipmentid qty catid 1 2 3 4 5 0 0 0 0 0 0 0 1 1 0 0 0 2 0 2 2 2 0 0 0 0 3 3 0 4 0 0 0 0
shipmentid qty
catid 1 2 3 4 5
0 0 0 0 0 0 0
1 1 0 0 0 2 0
2 2 2 0 0 0 0
3 3 0 4 0 0 0
0 0 0 0 0 0 0
我想做存储。选择('data','shipmentid==2')
,但我得到的错误是没有定义'shipmentid'
ValueError: The passed where expression: shipmentid==2
contains an invalid variable reference
all of the variable refrences must be a reference to
an axis (e.g. 'index' or 'columns'), or a data_column
The currently defined references are: columns,index
写这个select语句的正确方法是什么
编辑:添加示例代码
import pandas as pd
from pandas import *
import random
def createFrame():
data = {
('shipmentid',''):{1:1,2:2,3:3},
('qty',1):{1:5,2:5,3:5},
('qty',2):{1:6,2:6,3:6},
('qty',3):{1:7,2:7,3:7}
}
frame = pd.DataFrame(data)
return frame
def createStore():
store = pd.HDFStore('sample.h5',format='table')
return store
frame = createFrame()
print(frame)
print('\n')
print(frame.info())
store = createStore()
store.put('data',frame,format='t')
print('\n')
print(store)
results = store.select('data','shipmentid == 2')
store.close()
我敢打赌你一定用过这样的东西来创建你的商店
In [207]:
data = pd.DataFrame(np.random.randn(8,2), columns=['shipmentid', 'qty'])
store = pd.HDFStore('borrar')
store.put('data', data, format='t')
如果您随后尝试执行选择
,确实会得到您描述的错误
In [208]:
store.select('data', 'shipmentid>0')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-211-5d0c4082cdcf> in <module>()
----> 1 store.select('data', 'shipmentid>0')
...
ValueError: The passed where expression: shipmentid>0
contains an invalid variable reference
all of the variable refrences must be a reference to
(老实说,我不知道为什么它一种方式有效,另一种方式无效,我的猜测是,在第一种方式中,您不能指定数据列。但这是其中一种会让您发疯的方式。)
编辑:
更新发布的代码后,数据帧有一个多索引
。类似的更新代码如下所示:
In [273]:
import pandas as pd
from pandas import *
import random
def createFrame():
data = {
('shipmentid',''):{1:1,2:2,3:3},
('qty',1):{1:5,2:5,3:5},
('qty',2):{1:6,2:6,3:6},
('qty',3):{1:7,2:7,3:7}
}
frame = pd.DataFrame(data)
return frame
frame = createFrame()
print(frame)
print('\n')
print(frame.info())
frame.to_hdf('sample.h5', 'data', append=True, mode='w', data_columns=['shipmentid'], format='table')
pd.read_hdf('sample.h5','data', 'shipmentid == 2')
但是我得到了一个错误(我想你也得到了同样的错误):
问题似乎源于使用嵌套列。请参阅我刚才添加的完整示例代码。更新了答案,但可能不再是答案。希望它能有所帮助
In [273]:
import pandas as pd
from pandas import *
import random
def createFrame():
data = {
('shipmentid',''):{1:1,2:2,3:3},
('qty',1):{1:5,2:5,3:5},
('qty',2):{1:6,2:6,3:6},
('qty',3):{1:7,2:7,3:7}
}
frame = pd.DataFrame(data)
return frame
frame = createFrame()
print(frame)
print('\n')
print(frame.info())
frame.to_hdf('sample.h5', 'data', append=True, mode='w', data_columns=['shipmentid'], format='table')
pd.read_hdf('sample.h5','data', 'shipmentid == 2')
qty shipmentid
1 2 3
1 5 6 7 1
2 5 6 7 2
3 5 6 7 3
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 1 to 3
Data columns (total 4 columns):
(qty, 1) 3 non-null int64
(qty, 2) 3 non-null int64
(qty, 3) 3 non-null int64
(shipmentid, ) 3 non-null int64
dtypes: int64(4)
memory usage: 120.0 bytes
None
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-273-e10e811fc7c0> in <module>()
23 print(frame.info())
24
---> 25 frame.to_hdf('sample.h5', 'data', append=True, mode='w', data_columns=['shipmentid'], format='table')
26 pd.read_hdf('sample.h5','data', 'shipmentid == 2')
.....
stack trace
.....
ValueError: cannot use a multi-index on axis [1] with data_columns ['shipmentid']
new_frame = store.get('data')
print new_frame[new_frame['shipmentid'] == 2]
<class 'pandas.io.pytables.HDFStore'>
File path: sample.h5
/data frame_table (typ->appendable,nrows->3,ncols->4,indexers->[index])
qty shipmentid
1 2 3
2 5 6 7 2