Python 尝试访问行索引时出错_Python_Pandas_Dataframe

Python 尝试访问行索引时出错

python pandas dataframe

Python 尝试访问行索引时出错,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个名为clean的数据帧，它被分成两个示例：train_数据和test_数据，代码如下： train_data = clean.sample(frac=0.75) test_data = clean.drop(train_data.index) 我试图从train_数据帧生成一个单词频率数据帧。我从代码开始 from collections import defaultdict as dct phrases = [] for word in train_data['Message']

我有一个名为clean的数据帧，它被分成两个示例：train_数据和test_数据，代码如下：

train_data = clean.sample(frac=0.75)
test_data = clean.drop(train_data.index)

我试图从train_数据帧生成一个单词频率数据帧。我从代码开始

from collections import defaultdict as dct

phrases = []
for word in train_data['Message']:
    phrases.append(word.split())
    
ham = dct(int)
spam = dct(int)
    
for i in range(len(phrases)):
    if train_data['Category'][i] == 'ham':
        print(train_data['Category'][i])
    elif train_data['Category'][i] == 'spam':
        print(train_data['Category'][i])

但是，当索引i不在列数据中时，如果列数据['Category'][i]='ham'：在行中出现错误：

KeyError                                  Traceback (most recent call last)
~/Library/Python/3.8/lib/python/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 5

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-97-17de52f682b3> in <module>
      9 
     10 for i in range(len(phrases)):
---> 11     if train_data['Category'][i] == 'ham':
     12         print(train_data['Category'][i])
     13     elif train_data['Category'][i] == 'spam':

~/Library/Python/3.8/lib/python/site-packages/pandas/core/series.py in __getitem__(self, key)
    851 
    852         elif key_is_scalar:
--> 853             return self._get_value(key)
    854 
    855         if is_hashable(key):

~/Library/Python/3.8/lib/python/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
    959 
    960         # Similar to Index.get_value, but we do not fall back to positional
--> 961         loc = self.index.get_loc(label)
    962         return self.index._get_values_for_loc(self, loc, label)
    963 

~/Library/Python/3.8/lib/python/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 5

问题出在哪里？

请查看和的文档

你可以尝试使用

如果列车数据['Category'].iloc[i]='ham'

修改后的代码将为：

适用于范围内的i（len（短语））：
如果列_数据['Category'].iloc[i]='ham'：
打印（列_数据['Category'].iloc[i]）
elif列_数据['Category']。iloc[i]=='spam'：
打印（列_数据['Category'].iloc[i]）

键错误：5

表示索引为

的行不存在。这是因为在使用

.sample（）

时，使用了原始DF的索引，并且可能没有拾取

行

示例DF：

   letter
0     A
1     B
2     C
3     D
4     E
5     F

sampled=df.sample（分数=0.5）

如果您尝试使用范围（…）内的x的

迭代样本，0
不存在，并将给出一个错误
您可以在.sample（）之后使用.reset\u index（）

sampled=df.sample（frac=0.5）。重置索引（）
总之，有一些建议：
不要迭代DF的行。尝试使用矢量化操作：

要制作词频记录，您可以使用集合
中的计数器
：

   letter
0     A
1     B
2     C
3     D
4     E
5     F

   letter
3    D
1    B
4    E

   letter
0    D
1    B
2    E