Python 如何删除英语和西班牙语的停止词_Python_Nlp_Language Detection

Python 如何删除英语和西班牙语的停止词

python nlp

Python 如何删除英语和西班牙语的停止词,python,nlp,language-detection,Python,Nlp,Language Detection,我正在尝试删除英语和西班牙语的停止词。我的代码适用于英语，但不适用于西班牙语： stopword = nltk.corpus.stopwords.words('english', 'spanish') def remove_stopwords(text): text = [word for word in text if word not in stopword] return text df['Tweet_nonstop'] = df['Tweet_tokenize

我正在尝试删除英语和西班牙语的停止词。我的代码适用于英语，但不适用于西班牙语：

stopword = nltk.corpus.stopwords.words('english', 'spanish')

def remove_stopwords(text):
    text = [word for word in text if word not in stopword]
    return text
    
df['Tweet_nonstop'] = df['Tweet_tokenized'].apply(lambda x: remove_stopwords(x))

有人能帮忙解决这个问题吗？谢谢

要获取英语和西班牙语停止词，您可以使用：

stopword_en = nltk.corpus.stopwords.words('english')
stopword_es = nltk.corpus.stopwords.words('spanish')
stopword = stopword_en + stopword_es

帮助中

nltk.corpus.stopwords.words

的第二个参数不是另一种语言：

>>> help(nltk.corpus.stopwords.words)
Help on method words in module nltk.corpus.reader.wordlist:

words(fileids=None, ignore_lines_startswith='\n') method of nltk.corpus.reader.wordlist.WordListCorpusReader instance

第一个参数，

fileids

可以接受多个值，因此，像

nltk.corpus.stopwords.words（fileids=（'english'，'spanish'））

这样的调用也可以正常工作。

除了上面的答案之外，请尝试

stopwords.words(['english','spanish'])