Python 2.7 熊猫：熊猫。_libs.hashtable.Int64HashTable.get_项_Python 2.7_Pandas

Python 2.7 熊猫：熊猫。_libs.hashtable.Int64HashTable.get_项

python-2.7 pandas

Python 2.7 熊猫：熊猫。_libs.hashtable.Int64HashTable.get_项,python-2.7,pandas,Python 2.7,Pandas,我有以下代码在数据帧df上运行： print df categories = df['my_classification'].unique() for c in categories: print c win = df[df.result == 'Won'][df['my_classification'] == c]['prob'][0] print type(win) lost = df[df.result == 'Lost'][df['my_cl

我有以下代码在数据帧

df

上运行：

  print df
  categories = df['my_classification'].unique()
  for c in categories:
    print c
    win = df[df.result == 'Won'][df['my_classification'] == c]['prob'][0]

    print type(win)
    lost = df[df.result == 'Lost'][df['my_classification'] == c]['prob'][0]
    print type(lost)

然后我得到了以下输出：

   result          my_classification      prob
0  Won                   ENTERPRISE      0.657895
1  Won                   COMMERCIAL      0.342105
2  Lost                  ENTERPRISE      0.611842
3  Lost                  COMMERCIAL      0.388158
ENTERPRISE
<type 'numpy.float64'>

结果我的分类问题
0韩元企业0.657895
1韩元商业0.342105
2丢失的企业0.611842
3商业损失0.388158
企业

以及错误：

There was a problem running this cell
KeyError 0 
KeyErrorTraceback (most recent call last)
<ipython-input-4-38a901f9868a> in <module>()
     38 
     39     print type(win)
---> 40     lost = df[df.result == 'Lost'][df['my_classification'] == c]['prob'][0]
     41 
     42     print type(lost)

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    599         key = com._apply_if_callable(key, self)
    600         try:
--> 601             result = self.index.get_value(self, key)
    602 
    603             if not is_scalar(result):

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in get_value(self, series, key)
   2426         try:
   2427             return self._engine.get_value(s, k,
-> 2428                                           tz=getattr(series.dtype, 'tz', None))
   2429         except KeyError as e1:
   2430             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4363)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4046)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:13913)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:13857)()

KeyError: 0

运行此单元格时出现问题
关键错误0
KeyErrorTraceback（最近一次呼叫最后一次）
在（）
38
39打印类型（win）
--->40 lost=df[df.result=='lost'][df['my_classification']==c]['prob'][0]
41
42打印类型（丢失）
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/series.pyc in____获取项目（self，key）
599 key=com.\u如果可调用（key，self），则应用
600次尝试：
-->601结果=self.index.get_值（self，key）
602
603如果不是标量（结果）：
/get_值中的opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/index/base.pyc（self、series、key）
2426尝试：
2427返回自引擎。获取值（s，k，
->2428 tz=getattr（series.dtype，'tz'，无））
2429除键错误为e1外：
2430如果len（self）>0且self.u输入['integer'，'boolean']：
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_值（pandas/_libs/index.c:4363）（）
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_值（pandas/_libs/index.c:4046）（）
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc（pandas/_libs/index.c:5085）（）
pandas/_libs/hashtable\u class\u helper.pxi在pandas中。_libs.hashtable.Int64HashTable.get\u项（pandas/_libs/hashtable.c:13913）（）
pandas/_libs/hashtable\u class_helper.pxi在pandas中。_libs.hashtable.Int64HashTable.get_项（pandas/_libs/hashtable.c:13857）（）
关键错误：0

我不明白的是：赢和输的格式完全相同，为什么

win

可以，但是

lost

产生了一个错误？谢谢

因为您从整个数据帧中获得了

类别

，但是对于赢和输，您通过子集过滤它们，有时它并不存在

例如：

  result my_classification      prob
0    Won        ENTERPRISE  0.657895
1    Won        COMMERCIAL  0.342105
2   Lost        ENTERPRISE  0.611842

当你这样做的时候

df[df.result == 'Lost'][df['my_classification'] == 'COMMERCIAL']['prob'][0]

它将返回错误

使用

groupby

df.groupby(['result','my_classification']).head(1)

但在我的例子中，错误发生在列表组中没有丢失的类别时。我还尝试了：df.groupby（['result'，'my_classification']）.head（1）。。。仍然是相同的错误…我注意到如果我将“df['my_classification']==c”替换为df['my_classification']==ENTERPRISE'，类别值的硬编码会使错误消失。。。为什么呢？