Python数据集中的条件过滤_Python_Python 3.x_Pandas_Dataset

Python数据集中的条件过滤

python python-3.x pandas

Python数据集中的条件过滤,python,python-3.x,pandas,dataset,Python,Python 3.x,Pandas,Dataset,我正在为PHOTON3中stata文件的排序操作而挣扎：我被要求只将没有孩子的家庭排除在数据集/表之外：我使用筛选条件将这些行从表中筛选出来： filtering_condition = df["kids"] > 0 df_nokids = df.loc[filtering_condition,"kids"] 但是，这给了我一个未知错误： KeyError Traceback (most recent call las

我正在为PHOTON3中stata文件的排序操作而挣扎：我被要求只将没有孩子的家庭排除在数据集/表之外：

我使用筛选条件将这些行从表中筛选出来：

filtering_condition = df["kids"] > 0

df_nokids = df.loc[filtering_condition,"kids"]

但是，这给了我一个未知错误：

KeyError                                  Traceback (most recent call last)
/opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
1944             try:
-> 1945                 return self._engine.get_loc(key)
   1946             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item     (pandas/hashtable.c:12322)()

KeyError: 'kids'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-321-e72cd8a67065> in <module>()
      1 #keep only the households without kids and use this dataset for the   rest of the assignment
----> 2 filtering_condition = df["kids"] > 0
      3 df_nokids = df.loc[filtering_condition,"kids"]

/opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in  __getitem__(self, key)
   1995             return self._getitem_multilevel(key)
   1996         else:
-> 1997             return self._getitem_column(key)
   1998 
   1999     def _getitem_column(self, key):

/opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in  _getitem_column(self, key)
   2002         # get column
   2003         if self.columns.is_unique:
-> 2004             return self._get_item_cache(key)
   2005 
   2006         # duplicate columns & possible reduce dimensionality

/opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py    in _get_item_cache(self, item)
   1348         res = cache.get(item)
   1349         if res is None:
-> 1350             values = self._data.get(item)
   1351             res = self._box_item_values(item, values)
   1352             cache[item] = res

/opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py     in get(self, item, fastpath)
   3288 
   3289             if not isnull(item):
-> 3290                 loc = self.items.get_loc(item)
   3291             else:
   3292                 indexer = np.arange(len(self.items))   [isnull(self.items)]

 /opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py    in get_loc(self, key, method, tolerance)
   1945                 return self._engine.get_loc(key)
   1946             except KeyError:
-> 1947                 return     self._engine.get_loc(self._maybe_cast_indexer(key))
   1948 
   1949         indexer = self.get_indexer([key], method=method,    tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)()

KeyError: 'kids'

keyrerror回溯（最近一次调用）
/get_loc中的opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/index/base.py（self、key、method、tolerance）
1944年尝试：
->1945返回自我。发动机。获取位置（钥匙）
1946除了键错误：
pandas/index.pyx在pandas.index.IndexEngine.get_loc（pandas/index.c:4154）（）
pandas/index.pyx在pandas.index.IndexEngine.get_loc（pandas/index.c:4018）（）
pandas.hashtable.PyObjectHashTable.get_项中的pandas/hashtable.pyx（pandas/hashtable.c:12368）（）
pandas.hashtable.PyObjectHashTable.get_项中的pandas/hashtable.pyx（pandas/hashtable.c:12322）（）
关键错误：“孩子们的
在处理上述异常期间，发生了另一个异常：
KeyError回溯（最近一次呼叫最后一次）
在（）
1#只保留没有孩子的家庭，并在剩余任务中使用此数据集
---->2过滤条件=df[“孩子”]>0
3 df_nokids=df.loc[过滤条件，“kids”]
/opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in\uuuuuu getitem\uuuuuu（self，key）
1995年返回自我。\u获取项目\u多级（关键）
1996年其他：
->1997返回自我。\u获取项目\u列（键）
1998
1999 def_getitem_列（self，key）：
/opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py在_getitem_列中（self，key）
2002年#获取专栏
2003如果self.columns.u是唯一的：
->2004返回自我。获取项目缓存（密钥）
2005
2006年#重复列和可能的降维
/缓存中的opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py（self，item）
1348 res=cache.get（项）
1349如果res为无：
->1350值=自身数据获取（项目）
1351 res=自身。_框_项_值（项，值）
1352缓存[项目]=res
/get中的opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py（self、item、fastpath）
3288
3289如果不为空（项目）：
->3290 loc=自身物品。获取物品位置（物品）
3291其他：
3292 indexer=np.arange（len（self.items））[isnull（self.items）]
/get_loc中的opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/index/base.py（self、key、method、tolerance）
1945返回自我。发动机。获取位置（钥匙）
1946除了键错误：
->1947返回self.\u引擎。获取位置（self.\u可能\u投射\u索引器（键））
1948
1949 indexer=self.get\u indexer（[key]，method=method，tolerance=tolerance）
pandas/index.pyx在pandas.index.IndexEngine.get_loc（pandas/index.c:4154）（）
pandas/index.pyx在pandas.index.IndexEngine.get_loc（pandas/index.c:4018）（）
pandas.hashtable.PyObjectHashTable.get_项中的pandas/hashtable.pyx（pandas/hashtable.c:12368）（）
pandas.hashtable.PyObjectHashTable.get_项中的pandas/hashtable.pyx（pandas/hashtable.c:12322）（）
关键错误：“孩子们的

我做错了什么，有什么解释吗

谢谢

数据文件：

你的意思是这样的吗：

df_kids = df[df['kids']>0]

这将选择“kids”列不为零的行。

您能提供一个可运行的代码吗？stacktrace显示Keyerror，这意味着原始数据中不存在“孩子”。如果你能发布完整的代码会有帮助。谢谢你的回复，我想你可能是对的，我之前忘记了一步，因为这是我使用的完整代码。我只是首先通过以下方式阅读了stata文件：pd.read_stata（“data/alcohol.dta”）。表格如下所示：，成人，儿童，收入，消费0 2 758 1 1 2 1785 1 2 0 1200 1 3 1 0 545 1 4 1 547 1感谢您的建议。但是，输入此代码会产生类似的错误代码。也许该列的编码不正确？或者我可能需要对numpy数组做些什么？可能您的数据类型不正确（不是整数）。使用print（df.dtypes）验证这一点。如果它没有为'kids'指定类似int64的内容，则必须首先将数据类型转换为整数。用df['kids']=pd.to_numeric（df['kids']）来表示。你是对的：它说x和y都是浮点数。我现在看到，这反映了我必须在两者之间完成的一项任务，即获取数据的描述性统计数据。我简单地使用了：df.descripe（），现在可以看到这些描述性统计数据；与孩子们的数据相对应，我猜描述代码也是错误的？好吧，float64对我上面给出的代码没有问题。然而，user567正确地指出您有一个关键错误。这意味着您的列可能没有按预期的方式命名。请您打印您的df（如果是长的，请打印df.head（））好吗。（最好把它添加到你的问题中）嗯，实际上它不是df。但是警察。那会有什么不同吗？