Python 在spaCy'；s匹配器_Python_Methods_Nlp_Spacy_Matcher

Python 在spaCy'；s匹配器

python methods nlp

Python 在spaCy'；s匹配器,python,methods,nlp,spacy,matcher,Python,Methods,Nlp,Spacy,Matcher,我刚刚在spaCy中为令牌添加了以下扩展： from spacy.tokens import Token has_dep = lambda token,name: name in [child.dep_ for child in token.children] Token.set_extension('HAS_DEP', method=has_dep) 因此，我想检查令牌是否有某个指定的依赖项名称作为其子项之一，因此如下所示： doc = nlp(u'We are walking around

我刚刚在spaCy中为

令牌

添加了以下扩展：

from spacy.tokens import Token
has_dep = lambda token,name: name in [child.dep_ for child in token.children]
Token.set_extension('HAS_DEP', method=has_dep)

因此，我想检查令牌是否有某个指定的依赖项名称作为其子项之一，因此如下所示：

doc = nlp(u'We are walking around.')
walking = doc[2]
walking._.HAS_DEP('nsubj')

输出

True

，因为“walking”有一个子项，其依赖项标记为“nsubj”（即单词“we”）

但是，我不明白如何将此扩展与spaCy的Matcher一起使用。下面是我写的。我期望的输出是行走，但似乎不起作用：

matcher = Matcher(nlp.vocab)

pattern = [
    {"_": {"HAS_DEP": {'name': 'nsubj'}}}  # this is the line I'm not sure of
    ]

matcher.add("depnsubj", None, pattern)

doc = nlp("We're walking around the house.")
matches = matcher(doc)

for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  
    span = doc[start:end]
    print(span)

我想您可以使用

doc.retokenize（）

和

token.head

来代替，如下所示：

from spacy.matcher import Matcher
import en_core_web_sm

nlp = en_core_web_sm.load()

matcher = Matcher(nlp.vocab)
pattern = [{'DEP': 'nsubj'}]
matcher.add("depnsubj", None, pattern)

doc = nlp("We're walking around the house.")
matches = matcher(doc)

matched_spans = []
for match_id, start, end in matches:
    span = doc[start:end]
    matched_spans.append(doc[start:end])

matched_tokens = []
with doc.retokenize() as retokenizer:
    for span in spans:
        retokenizer.merge(span)
        for token in span:
            print(token.head)

输出：

walking

我认为你的目标可能是通过一个

能手实现的

：

import spacy
from spacy.matcher import Matcher
from spacy.tokens import Token
has_dep = lambda token: 'nsubj' in [child.dep_ for child in token.children]
Token.set_extension('HAS_DEP_NSUBJ', getter=has_dep, force=True)

nlp = spacy.load("en_core_web_md")
matcher = Matcher(nlp.vocab)
matcher.add("depnsubj", None, [{"_": {"HAS_DEP_NSUBJ": True}}])

doc = nlp("We're walking around the house.")
matches = matcher(doc)

for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  
    span = doc[start:end]
    print(span)

walking

由于

Matcher

模式没有为扩展提供依赖项标签名称的机制，因此我认为这是最有效的解决方案。