Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/349.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python &引用;“关键错误”;标记化期间的数据帧中_Python_Pandas_Dataframe - Fatal编程技术网

Python &引用;“关键错误”;标记化期间的数据帧中

Python &引用;“关键错误”;标记化期间的数据帧中,python,pandas,dataframe,Python,Pandas,Dataframe,代码: 由于某种原因抛出了错误 ps = PorterStemmer() tokens = [] for i in range(0,len(df)): tweet = str(df['clean_tweet'][i]) tweet = tweet.lower() tweet = tweet.split() tweet = [ps.stem(word) for word in tweet if word not in stopWords] tweet = '

代码:

由于某种原因抛出了错误

ps = PorterStemmer()
tokens = []
for i in range(0,len(df)):
    tweet = str(df['clean_tweet'][i])
    tweet = tweet.lower()
    tweet = tweet.split()
    tweet = [ps.stem(word) for word in tweet if word not in stopWords]
    tweet = ' '.join(tweet)
    tokens.append(tweet)
    print(tokens[i])
df['clean_tweet'] = tokens
df.head()
---------------------------------------------------------------------------
KeyError回溯(最近一次呼叫最后一次)
get\u loc中的~\anaconda3\envs\machine\u learning\lib\site packages\pandas\core\index\base.py(self、key、method、tolerance)
3079尝试:
->3080自动返回引擎。获取锁定(铸造键)
3081除KeyError作为错误外:
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
pandas\\u libs\index.pyx在pandas.\u libs.index.IndexEngine.\u get\u loc\u duplicates()
pandas\\u libs\index\u class\u helper.pxi在pandas.\u libs.index.Int64Engine.\u可能\u获取\u bool\u索引器()
熊猫\\u libs\index.pyx在熊猫中。\u libs.index.IndexEngine.\u解包\u bool\u索引器()
密钥错误:31962
上述异常是以下异常的直接原因:
KeyError回溯(最近一次呼叫最后一次)
在里面
2个代币=[]
对于范围(0,len(df))中的i,为3:
---->4 tweet=str(df['clean_tweet'][i])
5 tweet=tweet.lower()
6 tweet=tweet.split()
~\anaconda3\envs\machine\u learning\lib\site packages\pandas\core\series.py in\uuuuuuuu getitem\uuuuuu(self,key)
851
852 elif键是标量:
-->853返回自我。获取值(键)
854
855如果可散列(键):
~\anaconda3\envs\machine\u learning\lib\site packages\pandas\core\series.py in\u get\u value(self、label、takeable)
959
960#类似于Index.get_值,但我们不会退回到位置
-->961 loc=自索引获取位置(标签)
962返回self.index.\u获取\u loc(self,loc,label)的\u值\u
963
get\u loc中的~\anaconda3\envs\machine\u learning\lib\site packages\pandas\core\index\base.py(self、key、method、tolerance)
3080自动返回引擎。获取锁定(铸造键)
3081除KeyError作为错误外:
->3082从err升起钥匙错误(钥匙)
3083
3084如果公差不是无:
密钥错误:31962

我不知道为什么会发生这种错误。数据帧的形状为56745行×4列,显然代码能够将
tweet
转换为标记化tweet,因此我认为当我用标记列表覆盖数据帧列时,可能会发生KeyError。

KeyError:31962
可能是数据帧的索引不连续,缺少
31962
。您可以在系列上尝试
apply()

def clean_tweet(tweet):
tweet=tweet.lower()
tweet=tweet.split()
tweet=[ps.stem(word)表示tweet中的单词,如果单词不在stopWords中]
返回“”。加入(tweet)
df['clean_tweet']=df['clean_tweet']。应用(clean_tweet)

错误是由for循环还是after引起的?我认为这是在循环之后,因为它可以很好地打印令牌。请尝试在循环开始处打印I before,以查看得到的是哪个数字/行。我也不认为这是获得clean_tweet的最佳实践。您是否尝试过在df['clean_tweet']中对tweet使用
来代替循环?是的。。成功了。谢谢你。
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._get_loc_duplicates()

pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._maybe_get_bool_indexer()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._unpack_bool_indexer()

KeyError: 31962

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-30-7794ad45df60> in <module>
      2 tokens = []
      3 for i in range(0,len(df)):
----> 4     tweet = str(df['clean_tweet'][i])
      5     tweet = tweet.lower()
      6     tweet = tweet.split()

~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    851 
    852         elif key_is_scalar:
--> 853             return self._get_value(key)
    854 
    855         if is_hashable(key):

~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
    959 
    960         # Similar to Index.get_value, but we do not fall back to positional
--> 961         loc = self.index.get_loc(label)
    962         return self.index._get_values_for_loc(self, loc, label)
    963 

~\anaconda3\envs\machine_learning\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 31962