pandas中的DataFrame.ix（）-是否有一个选项来捕获请求的列不存在时的情况？_Dataframe_Pandas

pandas中的DataFrame.ix（）-是否有一个选项来捕获请求的列不存在时的情况？

dataframe pandas

pandas中的DataFrame.ix（）-是否有一个选项来捕获请求的列不存在时的情况？,dataframe,pandas,Dataframe,Pandas,我的代码将CSV文件读入pandasDataFrame——并对其进行处理。代码依赖于列名-使用df.ix[，]获取列。最近，CSV文件中的一些列名发生了更改（恕不另行通知）。但代码并没有抱怨，而是默默地产生错误的结果。 ix[，]构造不检查列是否存在。如果没有，它只是创建它并填充NaN。这里是发生了什么的主要想法 df1=DataFrame({'a':[1,2,3],'b':[4,5,6]}) # columns 'a' & 'b' df2=df1.ix[:,['a','c

我的代码将CSV文件读入pandas

DataFrame

——并对其进行处理。代码依赖于列名-使用df.ix[，]获取列。最近，CSV文件中的一些列名发生了更改（恕不另行通知）。但代码并没有抱怨，而是默默地产生错误的结果。 ix[，]构造不检查列是否存在。如果没有，它只是创建它并填充NaN。这里是发生了什么的主要想法

df1=DataFrame({'a':[1,2,3],'b':[4,5,6]})   # columns 'a' & 'b'
df2=df1.ix[:,['a','c']]                    # trying to get 'a' & 'c'
print df2
       a   c
    0  1 NaN
    1  2 NaN
    2  3 NaN

所以它不会产生错误或警告

是否有其他方法选择特定列，并额外检查列是否存在

我目前的解决方法是使用我自己的小型实用程序功能，类似于：

import sys, inspect

def validate_cols_or_exit(df,cols):
  """
    Exits with error message if pandas DataFrame object df 
    doesn't have all columns from the provided list of columns
    Example of usage:
      validate_cols_or_exit(mydf,['col1','col2'])
  """
  dfcols = list(df.columns)
  valid_flag = True
  for c in cols:
    if c not in dfcols:
       print "Error, non-existent DataFrame column found - ",c
       valid_flag = False
  if not valid_flag:
    print "Error, non-existent DataFrame column(s) found in function ", inspect.stack()[1][3]
    print "valid column names are:"
    print "\n".join(df.columns)
    sys.exit(1)

不确定您是否可以约束数据帧，但是您的helper函数可以简单得多。（差不多）

那么：

In [3]: df1[['a', 'c']]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/home/wesm/code/pandas/<ipython-input-3-2349e89f1bb5> in <module>()
----> 1 df1[['a', 'c']]

/home/wesm/code/pandas/pandas/core/frame.py in __getitem__(self, key)
   1582             if com._is_bool_indexer(key):
   1583                 key = np.asarray(key, dtype=bool)
-> 1584             return self._getitem_array(key)
   1585         elif isinstance(self.columns, MultiIndex):
   1586             return self._getitem_multilevel(key)

/home/wesm/code/pandas/pandas/core/frame.py in _getitem_array(self, key)
   1609             mask = indexer == -1
   1610             if mask.any():
-> 1611                 raise KeyError("No column(s) named: %s" % str(key[mask]))
   1612             result = self.reindex(columns=key)
   1613             if result.columns.name is None:

KeyError: 'No column(s) named: [c]'

[3]中的

df1[['a'，c']]
---------------------------------------------------------------------------
KeyError回溯（最近一次呼叫最后一次）
/home/wesm/code/pandas/in（）
---->1 df1[[a'，c']]
/home/wesm/code/pandas/pandas/core/frame.py in\uuuuu getitem\uuuuuuu（self，key）
1582如果com.\u是\u bool\u索引器（键）：
1583 key=np.asarray（key，dtype=bool）
->1584返回self.\u getitem\u数组（键）
1585 elif isinstance（自列、多索引）：
1586返回自我。\u获取项目\u多级（键）
/home/wesm/code/pandas/pandas/core/frame.py在_getitem_数组中（self，key）
1609掩码=索引器==-1
1610如果掩码.any（）：
->1611 raise KeyError（“没有名为%s的列”%str（键[mask]））
1612结果=self.reindex（列=键）
1613如果result.columns.name为无：
KeyError:'没有名为：[c]的列'

In [3]: df1[['a', 'c']]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/home/wesm/code/pandas/<ipython-input-3-2349e89f1bb5> in <module>()
----> 1 df1[['a', 'c']]

/home/wesm/code/pandas/pandas/core/frame.py in __getitem__(self, key)
   1582             if com._is_bool_indexer(key):
   1583                 key = np.asarray(key, dtype=bool)
-> 1584             return self._getitem_array(key)
   1585         elif isinstance(self.columns, MultiIndex):
   1586             return self._getitem_multilevel(key)

/home/wesm/code/pandas/pandas/core/frame.py in _getitem_array(self, key)
   1609             mask = indexer == -1
   1610             if mask.any():
-> 1611                 raise KeyError("No column(s) named: %s" % str(key[mask]))
   1612             result = self.reindex(columns=key)
   1613             if result.columns.name is None:

KeyError: 'No column(s) named: [c]'