Nlp 空间名词组块产生意外的引理、词性、标记和dep_Nlp_Spacy

Nlp 空间名词组块产生意外的引理、词性、标记和dep

nlp

Nlp 空间名词组块产生意外的引理、词性、标记和dep,nlp,spacy,Nlp,Spacy,我正在使用spacy解析文档，不幸的是，我无法以我预期的方式处理名词块。下面是我的代码： # Import spacy import spacy nlp = spacy.load("en_core_web_lg") # Add noun chunking to the pipeline merge_noun_chunks = nlp.create_pipe("merge_noun_chunks") nlp.add_pipe(merge_noun_chunks) # Process the d

我正在使用spacy解析文档，不幸的是，我无法以我预期的方式处理名词块。下面是我的代码：

# Import spacy
import spacy
nlp = spacy.load("en_core_web_lg")

# Add noun chunking to the pipeline
merge_noun_chunks = nlp.create_pipe("merge_noun_chunks")
nlp.add_pipe(merge_noun_chunks)

# Process the document
docs = nlp.pipe(["The big dogs chased the fast cat"])

# Print out the tokens
for doc in docs:
    for token in doc:
        print("text: {}, lemma: {}, pos: {}, tag: {}, dep: {}".format(tname, token.text, token.lemma_, token.pos_, token.tag_, token.dep_))

我得到的结果如下：

text: The big dogs, lemma: the, pos: NOUN, tag: NNS, dep: nsubj
text: chased, lemma: chase, pos: VERB, tag: VBD, dep: ROOT
text: the fast cat, lemma: the, pos: NOUN, tag: NN, dep: dobj

text: The big dogs, lemma: the big dog, pos: NOUN, tag: NNS, dep: nsubj
text: chased, lemma: chase, pos: VERB, tag: VBD, dep: ROOT
text: the fast cat, lemma: the fast cat, pos: NOUN, tag: NN, dep: dobj

问题出现在输出的第一行，其中“大狗”被意外地解析：它创建了“The”的“引理”，并表示它是“名词”的“pos”，是“NNS”的“tag”，是“nsubj”的“dep”

我希望得到的结果如下：

text: The big dogs, lemma: the, pos: NOUN, tag: NNS, dep: nsubj
text: chased, lemma: chase, pos: VERB, tag: VBD, dep: ROOT
text: the fast cat, lemma: the, pos: NOUN, tag: NN, dep: dobj

text: The big dogs, lemma: the big dog, pos: NOUN, tag: NNS, dep: nsubj
text: chased, lemma: chase, pos: VERB, tag: VBD, dep: ROOT
text: the fast cat, lemma: the fast cat, pos: NOUN, tag: NN, dep: dobj

我希望“引理”是“大狗”，复数形式改为单数，短语是“名词”的“pos”，是“NNS”的“tag”，是“nsubj”的“dep”

这是正确的行为，还是我使用spacy不正确？如果我不正确地使用SPACY，请告诉我执行这个任务的正确方式。

这里有一些要考虑的事项

柠檬化是基于令牌的

词性标注和依赖性分析是预测性的

如果您为每个标记使用

引理

属性，您可能会得到

大狗。它不会更新您使用该属性时的令牌pos
此外，由于依赖性分析和词性标注是在预测模型中训练的，因此从人类语言学的角度来看，并不保证总是“正确的”
除了引理问题之外，似乎您使用的是spacy正确