Pandas HDF5在非自然命名列上选择Where_Pandas_Hdf5_Pytables

Pandas HDF5在非自然命名列上选择Where

pandas

Pandas HDF5在非自然命名列上选择Where,pandas,hdf5,pytables,Pandas,Hdf5,Pytables,在我继续疯狂发行异国情调熊猫/HDF5的过程中，我遇到了以下问题：我有一系列非自然命名列（注意：因为一个很好的理由，负数是“系统”ID等），通常不会出现问题： fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13']) 但是，我的select语句确实包含以下内容： >>> fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'], where=[('a-6',

在我继续疯狂发行异国情调熊猫/HDF5的过程中，我遇到了以下问题：

我有一系列非自然命名列（注意：因为一个很好的理由，负数是“系统”ID等），通常不会出现问题：

fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'])

但是，我的select语句确实包含以下内容：

>>> fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'], where=[('a-6', '=', [0, 25, 28])])
blablabla
File "/srv/www/li/venv/local/lib/python2.7/site-packages/tables/table.py", line 1251, in _required_expr_vars
    raise NameError("name ``%s`` is not defined" % var)
NameError: name ``a`` is not defined

有没有办法解决这个问题？我可以将负值从“a-1”重命名为“a_1”，但这意味着重新加载系统中的所有数据。太多了！：）

欢迎提出建议

这是一张测试表

In [1]: df = DataFrame({ 'a-6' : [1,2,3,np.nan] })

In [2]: df
Out[2]: 
   a-6
0    1
1    2
2    3
3  NaN

In [3]: df.to_hdf('test.h5','df',mode='w',table=True)

 In [5]: df.to_hdf('test.h5','df',mode='w',table=True,data_columns=True)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6_kind'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6_dtype'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)

有一种非常有效的方法，但是可以将其构建到代码本身中。可以对列名执行变量替换，如下所示。这是现有的例程（主控）

如果你这样做

(Pdb) self.table.table.readWhere("(x>2.0)",
      condvars={ 'x' : getattr(self.table.table.cols,'a-6')})
array([(2, 3.0)], 
      dtype=[('index', '<i8'), ('a-6', '<f8')])

（Pdb）self.table.table.readWhere（“（x>2.0）”，
condvars={'x'：getattr（self.table.table.cols，'a-6'））
数组（[（2,3.0）]，
dtype=[（'index'，'再次感谢Jeff！我不知道它是否在0.10/0.11和0.12之间变化，但我想随着即将到来的0.13新版本，无论如何它都不重要：）将其放入pandas HDF5文档中可能是一个不错的主意，不过作为对新用户的警告？因为pandas本身并没有问题，而且正常的数据检索工作也可以进行，所以在您遇到select语句问题之前，这不是很明显的事情……是的……我会这样做。pandas版本并不重要（尽管在0.13中，允许的语法对于where's更灵活，所以它实际上是一个非常棘手的问题），请参阅
(Pdb) self.table.table.readWhere("(x>2.0)",
      condvars={ 'x' : getattr(self.table.table.cols,'a-6')})
array([(2, 3.0)], 
      dtype=[('index', '<i8'), ('a-6', '<f8')])