Python 尝试访问行索引时出错
我有一个名为clean的数据帧,它被分成两个示例:train_数据和test_数据,代码如下:Python 尝试访问行索引时出错,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个名为clean的数据帧,它被分成两个示例:train_数据和test_数据,代码如下: train_data = clean.sample(frac=0.75) test_data = clean.drop(train_data.index) 我试图从train_数据帧生成一个单词频率数据帧。 我从代码开始 from collections import defaultdict as dct phrases = [] for word in train_data['Message']
train_data = clean.sample(frac=0.75)
test_data = clean.drop(train_data.index)
我试图从train_数据帧生成一个单词频率数据帧。
我从代码开始
from collections import defaultdict as dct
phrases = []
for word in train_data['Message']:
phrases.append(word.split())
ham = dct(int)
spam = dct(int)
for i in range(len(phrases)):
if train_data['Category'][i] == 'ham':
print(train_data['Category'][i])
elif train_data['Category'][i] == 'spam':
print(train_data['Category'][i])
但是,当索引i不在列数据中时,如果列数据['Category'][i]='ham':在行中出现错误:
KeyError Traceback (most recent call last)
~/Library/Python/3.8/lib/python/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 5
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-97-17de52f682b3> in <module>
9
10 for i in range(len(phrases)):
---> 11 if train_data['Category'][i] == 'ham':
12 print(train_data['Category'][i])
13 elif train_data['Category'][i] == 'spam':
~/Library/Python/3.8/lib/python/site-packages/pandas/core/series.py in __getitem__(self, key)
851
852 elif key_is_scalar:
--> 853 return self._get_value(key)
854
855 if is_hashable(key):
~/Library/Python/3.8/lib/python/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
959
960 # Similar to Index.get_value, but we do not fall back to positional
--> 961 loc = self.index.get_loc(label)
962 return self.index._get_values_for_loc(self, loc, label)
963
~/Library/Python/3.8/lib/python/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 5
问题出在哪里?请查看和的文档 你可以尝试使用
如果列车数据['Category'].iloc[i]='ham'
修改后的代码将为:
适用于范围内的i(len(短语)):
如果列_数据['Category'].iloc[i]='ham':
打印(列_数据['Category'].iloc[i])
elif列_数据['Category']。iloc[i]=='spam':
打印(列_数据['Category'].iloc[i])
键错误:5
表示索引为5
的行不存在。这是因为在使用.sample()
时,使用了原始DF的索引,并且可能没有拾取5
行
示例DF:
letter
0 A
1 B
2 C
3 D
4 E
5 F
sampled=df.sample(分数=0.5)
如果您尝试使用范围(…)内的x的迭代样本,0
不存在,并将给出一个错误
您可以在.sample()之后使用.reset\u index()
sampled=df.sample(frac=0.5)。重置索引()
总之,有一些建议:
不要迭代DF的行。尝试使用矢量化操作:
要制作词频记录,您可以使用集合
中的计数器
:
letter
0 A
1 B
2 C
3 D
4 E
5 F
letter
3 D
1 B
4 E
letter
0 D
1 B
2 E