Python &引用；“关键错误”；标记化期间的数据帧中_Python_Pandas_Dataframe

Python &引用；“关键错误”；标记化期间的数据帧中

python pandas dataframe

Python &引用；“关键错误”；标记化期间的数据帧中,python,pandas,dataframe,Python,Pandas,Dataframe,代码：由于某种原因抛出了错误 ps = PorterStemmer() tokens = [] for i in range(0,len(df)): tweet = str(df['clean_tweet'][i]) tweet = tweet.lower() tweet = tweet.split() tweet = [ps.stem(word) for word in tweet if word not in stopWords] tweet = '

代码：

由于某种原因抛出了错误

ps = PorterStemmer()
tokens = []
for i in range(0,len(df)):
    tweet = str(df['clean_tweet'][i])
    tweet = tweet.lower()
    tweet = tweet.split()
    tweet = [ps.stem(word) for word in tweet if word not in stopWords]
    tweet = ' '.join(tweet)
    tokens.append(tweet)
    print(tokens[i])
df['clean_tweet'] = tokens
df.head()

---------------------------------------------------------------------------
KeyError回溯（最近一次呼叫最后一次）
get\u loc中的~\anaconda3\envs\machine\u learning\lib\site packages\pandas\core\index\base.py（self、key、method、tolerance）
3079尝试：
->3080自动返回引擎。获取锁定（铸造键）
3081除KeyError作为错误外：
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc（）
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc（）
pandas\\u libs\index.pyx在pandas.\u libs.index.IndexEngine.\u get\u loc\u duplicates（）
pandas\\u libs\index\u class\u helper.pxi在pandas.\u libs.index.Int64Engine.\u可能\u获取\u bool\u索引器（）
熊猫\\u libs\index.pyx在熊猫中。\u libs.index.IndexEngine.\u解包\u bool\u索引器（）
密钥错误：31962
上述异常是以下异常的直接原因：
KeyError回溯（最近一次呼叫最后一次）
在里面
2个代币=[]
对于范围（0，len（df））中的i，为3：
---->4 tweet=str（df['clean_tweet'][i]）
5 tweet=tweet.lower（）
6 tweet=tweet.split（）
~\anaconda3\envs\machine\u learning\lib\site packages\pandas\core\series.py in\uuuuuuuu getitem\uuuuuu（self，key）
851
852 elif键是标量：
-->853返回自我。获取值（键）
854
855如果可散列（键）：
~\anaconda3\envs\machine\u learning\lib\site packages\pandas\core\series.py in\u get\u value（self、label、takeable）
959
960#类似于Index.get_值，但我们不会退回到位置
-->961 loc=自索引获取位置（标签）
962返回self.index.\u获取\u loc（self，loc，label）的\u值\u
963
get\u loc中的~\anaconda3\envs\machine\u learning\lib\site packages\pandas\core\index\base.py（self、key、method、tolerance）
3080自动返回引擎。获取锁定（铸造键）
3081除KeyError作为错误外：
->3082从err升起钥匙错误（钥匙）
3083
3084如果公差不是无：
密钥错误：31962

我不知道为什么会发生这种错误。数据帧的形状为56745行×4列，显然代码能够将

tweet

转换为标记化tweet，因此我认为当我用标记列表覆盖数据帧列时，可能会发生KeyError。

KeyError:31962

可能是数据帧的索引不连续，缺少

。您可以在系列上尝试

apply（）

def clean_tweet（tweet）：
tweet=tweet.lower（）
tweet=tweet.split（）
tweet=[ps.stem（word）表示tweet中的单词，如果单词不在stopWords中]
返回“”。加入（tweet）
df['clean_tweet']=df['clean_tweet']。应用（clean_tweet）

错误是由for循环还是after引起的？我认为这是在循环之后，因为它可以很好地打印令牌。请尝试在循环开始处打印I before，以查看得到的是哪个数字/行。我也不认为这是获得clean_tweet的最佳实践。您是否尝试过在df['clean_tweet']中对tweet使用

：

来代替循环？是的。。成功了。谢谢你。

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._get_loc_duplicates()

pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._maybe_get_bool_indexer()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._unpack_bool_indexer()

KeyError: 31962

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-30-7794ad45df60> in <module>
      2 tokens = []
      3 for i in range(0,len(df)):
----> 4     tweet = str(df['clean_tweet'][i])
      5     tweet = tweet.lower()
      6     tweet = tweet.split()

~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    851 
    852         elif key_is_scalar:
--> 853             return self._get_value(key)
    854 
    855         if is_hashable(key):

~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
    959 
    960         # Similar to Index.get_value, but we do not fall back to positional
--> 961         loc = self.index.get_loc(label)
    962         return self.index._get_values_for_loc(self, loc, label)
    963 

~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 31962