Python3、pandas和创建新列失败,并出现keyerror

Python3、pandas和创建新列失败,并出现keyerror,python,pandas,lambda,apply,Python,Pandas,Lambda,Apply,我一直在使用dataframe上的apply方法来创建新列。所以,如果我有一个df,看起来像这样: stdf.columns Index(['Username', 'First Name', 'Last Name', 'Class', 'Screens Typed','Time Spent', 'Avg Speed', 'Avg Acc'], dtype='object') 我一直在使用这样的语法来创建新的列 stdf['uid'] = stdf['Username'].apply(lamb

我一直在使用dataframe上的apply方法来创建新列。所以,如果我有一个df,看起来像这样:

stdf.columns
Index(['Username', 'First Name', 'Last Name', 'Class', 'Screens Typed','Time Spent', 'Avg Speed', 'Avg Acc'],  dtype='object')
我一直在使用这样的语法来创建新的列

stdf['uid'] = stdf['Username'].apply(lambda x: x[0:6]) + "-" + stdf['First Name'] + "-" + stdf['Last Name']
今天,当使用相同的方法创建一个新列时,我在新列名上得到一个keyerror

stdf['truSpeed'] = stdf['nSpeed'].apply(lambda x: x * .1 * stdf["truAcc"])
是,“nSpeed”和“truAcc”确实作为列存在

Index(['Username', 'First Name', 'Last Name', 'Class', 'Screens Typed', 'Time Spent', 'Avg Speed', 'Avg Acc', 'truTime', 'uid', 'truAcc',
'nSpeed'],dtype='object')

keyerror指向“truSpeed标识符”。 所以我的问题是,为什么熊猫现在告诉我,我在尝试创建一个新列时出现了一个关键错误,而它过去总是创建一个新列

肯定还有别的错误,我没看到

这是几乎全部的回溯

KeyError                                  Traceback (most recent call last)
/home/david/dev/msc/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'truSpeed'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py in set(self, item, value, check)
   3667         try:
-> 3668             loc = self.items.get_loc(item)
   3669         except KeyError:

/home/david/dev/msc/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'truSpeed'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-18-35d20ff4edf0> in <module>()
      4 stdf['nSpeed'] = stdf['Avg Speed'].apply(lambda x: int(x.split(" ")[0]))
      5 print(stdf.columns)
----> 6 stdf['truSpeed'] = stdf['nSpeed'].apply(lambda x: x * .1 * stdf["truAcc"])
      7 # stdf['truSpeed']
      8 # print(stdf.columns)

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   2417         else:
   2418             # set column
-> 2419             self._set_item(key, value)
   2420 
   2421     def _setitem_slice(self, key, value):

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/frame.py in _set_item(self, key, value)
   2484         self._ensure_valid_index(value)
   2485         value = self._sanitize_column(key, value)
-> 2486         NDFrame._set_item(self, key, value)
   2487 
   2488         # check if we are modifying a copy

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/generic.py in _set_item(self, key, value)
   1498 
   1499     def _set_item(self, key, value):
-> 1500         self._data.set(key, value)
   1501         self._clear_item_cache()
   1502 

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py in set(self, item, value, check)
   3669         except KeyError:
   3670             # This item wasn't present, just insert at end
-> 3671             self.insert(len(self.items), item, value)
   3672             return
   3673 

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py in insert(self, loc, item, value, allow_duplicates)
   3770 
   3771         block = make_block(values=value, ndim=self.ndim,
-> 3772                            placement=slice(loc, loc + 1))
   3773 
   3774         for blkno, count in _fast_count_smallints(self._blknos[loc:]):

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py in make_block(values, placement, klass, ndim, dtype, fastpath)
   2683                      placement=placement, dtype=dtype)
   2684 
-> 2685     return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
   2686 
   2687 # TODO: flexible with index=None and/or items=None

/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py in __init__(self, values, placement, ndim, fastpath)
    107             raise ValueError('Wrong number of items passed %d, placement '
    108                              'implies %d' % (len(self.values),
--> 109                                              len(self.mgr_locs)))
    110 
    111     @property

ValueError: Wrong number of items passed 58, placement implies 1
keyrerror回溯(最近一次调用)
/home/david/dev/msc/lib/python3.5/site-packages/pandas/index/base.py in get_loc(self、key、method、tolerance)
2133尝试:
->2134返回发动机。获取位置(钥匙)
2135键错误除外:
pandas/index.pyx在pandas.index.IndexEngine.get_loc(pandas/index.c:4433)()
pandas/index.pyx在pandas.index.IndexEngine.get_loc(pandas/index.c:4279)()
pandas.hashtable.PyObjectHashTable.get_项中的pandas/src/hashtable_class_helper.pxi(pandas/hashtable.c:13742)()
pandas.hashtable.PyObjectHashTable.get_项中的pandas/src/hashtable_class_helper.pxi(pandas/hashtable.c:13696)()
KeyError:“truSpeed”
在处理上述异常期间,发生了另一个异常:
KeyError回溯(最近一次呼叫最后一次)
/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py在集合中(self、item、value、check)
3667尝试:
->3668 loc=自身物品。获取物品位置(物品)
3669键错误除外:
/home/david/dev/msc/lib/python3.5/site-packages/pandas/index/base.py in get_loc(self、key、method、tolerance)
2135键错误除外:
->2136返回self.\u引擎。获取self.\u loc(self.\u可能\u cast\u索引器(键))
2137
pandas/index.pyx在pandas.index.IndexEngine.get_loc(pandas/index.c:4433)()
pandas/index.pyx在pandas.index.IndexEngine.get_loc(pandas/index.c:4279)()
pandas.hashtable.PyObjectHashTable.get_项中的pandas/src/hashtable_class_helper.pxi(pandas/hashtable.c:13742)()
pandas.hashtable.PyObjectHashTable.get_项中的pandas/src/hashtable_class_helper.pxi(pandas/hashtable.c:13696)()
KeyError:“truSpeed”
在处理上述异常期间,发生了另一个异常:
ValueError回溯(最近一次调用上次)
在()
4 stdf['nSpeed']=stdf['Avg Speed'].应用(λx:int(x.split(“”[0]))
5个打印(stdf.列)
---->6 stdf['truSpeed']=stdf['nSpeed'].应用(λx:x*.1*stdf[“truAcc”])
7#stdf['truSpeed']
8#打印(标准列)
/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/frame.py in_u____设置项__(self、key、value)
2417其他:
2418#集合列
->2419自我设置项目(键、值)
2420
2421 def_setitem_切片(自身、键、值):
/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/frame.py in_set_item(self、key、value)
2484自我确保有效索引(值)
2485值=自清洁列(键,值)
->2486 NDFrame.\u设置\u项(自身、键、值)
2487
2488#检查我们是否正在修改副本
/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/generic.py in_set_item(self、key、value)
1498
1499定义设置项(自身、键、值):
->1500自身数据集(键、值)
1501自我清除项目缓存()
1502
/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py在集合中(self、item、value、check)
3669键错误除外:
3670#此项不存在,只需在末尾插入即可
->3671自我插入(len(自我项目)、item、value)
3672返回
3673
/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py插入(self、loc、item、value,允许重复)
3770
3771块=生成块(值=值,ndim=self.ndim,
->3772位置=切片(loc,loc+1))
3773
3774对于blkno,以快速计数小整数(self.\u blknos[loc:]):
/make_块中的home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py(值、位置、klass、ndim、数据类型、快速路径)
2683位置=位置,数据类型=数据类型)
2684
->2685返回klass(值,ndim=ndim,fastpath=fastpath,placement=placement)
2686
2687#待办事项:灵活,索引=无和/或项目=无
/home/david/dev/msc/lib/python3.5/site-packages/pandas/core/internals.py in_u_____________(自我、价值观、位置、ndim、快速路径)
107 raise VALUE ERROR('传递的项目数量错误%d,位置'
108'表示%d'(len(self.values),
-->109 len(自我管理)
110
111@property
ValueError:传递的项目数错误58,放置意味着1
应该是

stdf['truSpeed'] = stdf.eval('nSpeed * truAcc * .1')

还是用慢的方式

stdf['truSpeed'] = stdf.apply(lambda x: x['nSpeed'] * x['truAcc'] * .1, axis=1)

多亏了piRSquared,才有了更简单的语法。如评论中所述,df.eval语法是新的,并且有效。然而,似乎“适合”excel电子表格使用的范例的语法是第三种语法

stdf['truSpeed'] = stdf['nSpeed'] * stdf['truAcc'] * .1

我认为最初生成的keyerror一定是由其他错误引起的,因为使用标识符“truSpeed”只需在数据帧中创建新列

皮尔斯:谢谢。df.eval语法对我来说是新的。我想我可能一直在看老熊猫的文档。在列表末尾的旧的慢速方式仍然通过keyerror。我现在猜测,有一个不同的异常正在通过,关键错误正在掩盖它。在查看apply lambda语法后,我意识到我不需要
stdf['truSpeed'] = stdf.apply(lambda x: x['nSpeed'] * x['truAcc'] * .1, axis=1)
stdf['truSpeed'] = stdf['nSpeed'] * stdf['truAcc'] * .1