Python 使用nltk停止字从列中的列表中删除停止字时,删除停止字失败
我有一个带有字符串条目的数据框,我正在使用一个函数删除停止字。单元格已编译,但未生成预期结果Python 使用nltk停止字从列中的列表中删除停止字时,删除停止字失败,python,nltk,stop-words,Python,Nltk,Stop Words,我有一个带有字符串条目的数据框,我正在使用一个函数删除停止字。单元格已编译,但未生成预期结果 df['column'].iloc[0] = 'BK HE HAS KITCHEN TROUBLE WITH HIS BLENDER' def text_process(text): try : nopunc = [char for char in text if char not in sting.punctuation] nopunc = ' '.join(
df['column'].iloc[0] = 'BK HE HAS KITCHEN TROUBLE WITH HIS BLENDER'
def text_process(text):
try :
nopunc = [char for char in text if char not in sting.punctuation]
nopunc = ' '.join(nopunc)
return [word for word in nopunc.split() if word.lower not in stopwords.words('english')
except TypeError: return []
df['column'].apply(text_process)
The first cell results look like this :
['BK ', 'HE', 'HAS', 'KITCHEN', 'TROUBLE', 'WITH', 'HIS', 'BLENDER']
(他,已经,和他的)应该被移除,但他们仍然出现在牢房里?有人能解释一下这是怎么发生的,或者如何修复它吗
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
example_sent = "BK HE HAS KITCHEN TROUBLE WITH HIS BLENDER"
example_sent=example_sent.lower()
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(example_sent)
filtered_sentence = [w for w in word_tokens if not w in stop_words]
filtered_sentence = []
for w in word_tokens:
if w not in stop_words:
filtered_sentence.append(w)
print(word_tokens)
print(filtered_sentence)
['bk','he','has','kitchen','trouble','with','his','blender']
['bk','kitchen','trouble','blender']
['bk','he','has','kitchen','trouble','with','his','blender']
['bk','kitchen','trouble','blender']