Python 读取失败的hdf'；所有变量引用必须是对轴的引用…'；_Python_Pandas_Hdf5_Pytables

Python 读取失败的hdf'；所有变量引用必须是对轴的引用…'；

python pandas

Python 读取失败的hdf'；所有变量引用必须是对轴的引用…'；,python,pandas,hdf5,pytables,Python,Pandas,Hdf5,Pytables,卡在下面 log_iter = pd.read_hdf(FN, dspath, where = [pd.Term('hashID','=',idList)], iterator=True, chunksize=3000) dspath有35列，可能非常大，导致内存错误所以试着走Iterator/chunksize路线。但是'where='子句在 Val

卡在下面

log_iter = pd.read_hdf(FN, dspath, 
                       where = [pd.Term('hashID','=',idList)],
                       iterator=True, 
                       chunksize=3000)

dspath有35列，可能非常大，导致内存错误

所以试着走Iterator/chunksize路线。但是'where='子句在

ValueError: The passed where expression: [hashID=[147685,...,147197]]
        contains an invalid variable reference
        all of the variable refrences must be a reference to
        an axis (e.g. 'index' or 'columns'), or a data_column
        The currently defined references are: ** list of column names **

问题是hashID不在列名列表中。然而，如果我这样做了

read_hdf(FN, dspath).columns

哈希ID位于列中。有什么建议吗？我的目标是读取hashID在idList中的所有行x35列

更新。下面的代码可以工作，并显示一旦读入数据集，hashID就作为列存在

def dsIterator(self, q, idList):
    hID = u'hashID'
    FN = self.db._hdf_FN()
    dspath = self.getdatasetname(q)
    log_iter = pd.read_hdf(FN, dspath, 
                           #where = [pd.Term(u'logid_hashID','=',idList)],
                           iterator=True, 
                           chunksize=30000)
    n_all = 0
    retDF = None
    for dfChunk in log_iter:
        goodChunk = dfChunk.loc[dfChunk[hID].isin(idList)]
        if retDF is None : retDF = goodChunk
        else: 
            retDF = pd.concat([retDF, goodChunk], ignore_index=True)
        n_all += dfChunk[hID].count()
    n_ret = retDF[hID].count()
    return retDF

做

工作

如果空闲列表很大，这可能是个坏主意。

确实如此

工作

如果

idList

很大，这可能是个坏主意。

注意，我使用的是python2。因此，“hashID”必须使用u“hashID”作为列名。因此，“hashID”必须使用u“hashID”作为列名。

log_iter = pd.read_hdf(FN, dspath, 
              where = ['logid_hashID={:d}'.format(id_) for id_ in idList]
              iterator=True, 
              chunksize=3000)