Python 空间是"停止"'；不能识别停止词吗？_Python_Nlp_Spacy

Python 空间是"停止"'；不能识别停止词吗？

python nlp

Python 空间是"停止"'；不能识别停止词吗？,python,nlp,spacy,Python,Nlp,Spacy,当我使用SpaCy识别停止词时，如果我使用en\u core\u web\u lg语料库，它不起作用，但当我使用en\u core\u web\u sm时，它确实起作用。这是一个错误，还是我做错了什么 import spacy nlp = spacy.load('en_core_web_lg') doc = nlp(u'The cat ran over the hill and to my lap') for word in doc: print(f' {word} | {word.

当我使用SpaCy识别停止词时，如果我使用

en\u core\u web\u lg

语料库，它不起作用，但当我使用

en\u core\u web\u sm

时，它确实起作用。这是一个错误，还是我做错了什么

import spacy
nlp = spacy.load('en_core_web_lg')

doc = nlp(u'The cat ran over the hill and to my lap')

for word in doc:
    print(f' {word} | {word.is_stop}')

结果:

 The | False
 cat | False
 ran | False
 over | False
 the | False
 hill | False
 and | False
 to | False
 my | False
 lap | False

但是，当我将这一行更改为使用

en_core\u web\u sm

语料库时，我得到了不同的结果：

nlp = spacy.load('en_core_web_sm')

 The | False
 cat | False
 ran | False
 over | True
 the | True
 hill | False
 and | True
 to | True
 my | True
 lap | False

从spacy.lang.en.stop\u words import stop\u words中尝试

，然后可以显式检查单词是否在集合中
from spacy.lang.en.stop_words import STOP_WORDS
import spacy

nlp = spacy.load('en_core_web_lg')

doc = nlp(u'The cat ran over the hill and to my lap')

for word in doc:
    # Have to convert Token type to String, otherwise types won't match
    print(f' {word} | {str(word) in STOP_WORDS}')

产出如下：
The | False
 cat | False
 ran | False
 over | True
 the | True
 hill | False
 and | True
 to | True
 my | True
 lap | False

import spacy
from spacy.lang.en.stop_words import STOP_WORDS

nlp = spacy.load('en_core_web_lg')
for word in STOP_WORDS:
    for w in (word, word[0].capitalize(), word.upper()):
        lex = nlp.vocab[w]
        lex.is_stop = True

doc = nlp(u'The cat ran over the hill and to my lap')

for word in doc:
    print('{} | {}'.format(word, word.is_stop))

在我看来像个虫子。但是，如果需要，这种方法还可以灵活地将单词添加到STOP_words
集合中。建议的解决方法如下：
The | False
 cat | False
 ran | False
 over | True
 the | True
 hill | False
 and | True
 to | True
 my | True
 lap | False

import spacy
from spacy.lang.en.stop_words import STOP_WORDS

nlp = spacy.load('en_core_web_lg')
for word in STOP_WORDS:
    for w in (word, word[0].capitalize(), word.upper()):
        lex = nlp.vocab[w]
        lex.is_stop = True

doc = nlp(u'The cat ran over the hill and to my lap')

for word in doc:
    print('{} | {}'.format(word, word.is_stop))

输出
The | False
cat | False
ran | False
over | True
the | True
hill | False
and | True
to | True
my | True
lap | False