Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/290.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在spaCy';s匹配器_Python_Methods_Nlp_Spacy_Matcher - Fatal编程技术网

Python 在spaCy';s匹配器

Python 在spaCy';s匹配器,python,methods,nlp,spacy,matcher,Python,Methods,Nlp,Spacy,Matcher,我刚刚在spaCy中为令牌添加了以下扩展: from spacy.tokens import Token has_dep = lambda token,name: name in [child.dep_ for child in token.children] Token.set_extension('HAS_DEP', method=has_dep) 因此,我想检查令牌是否有某个指定的依赖项名称作为其子项之一,因此如下所示: doc = nlp(u'We are walking around

我刚刚在spaCy中为
令牌
添加了以下扩展:

from spacy.tokens import Token
has_dep = lambda token,name: name in [child.dep_ for child in token.children]
Token.set_extension('HAS_DEP', method=has_dep)
因此,我想检查令牌是否有某个指定的依赖项名称作为其子项之一,因此如下所示:

doc = nlp(u'We are walking around.')
walking = doc[2]
walking._.HAS_DEP('nsubj')
输出
True
,因为“walking”有一个子项,其依赖项标记为“nsubj”(即单词“we”)

但是,我不明白如何将此扩展与spaCy的Matcher一起使用。下面是我写的。我期望的输出是行走,但似乎不起作用:

matcher = Matcher(nlp.vocab)

pattern = [
    {"_": {"HAS_DEP": {'name': 'nsubj'}}}  # this is the line I'm not sure of
    ]

matcher.add("depnsubj", None, pattern)

doc = nlp("We're walking around the house.")
matches = matcher(doc)

for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  
    span = doc[start:end]
    print(span)

我想您可以使用
doc.retokenize()
token.head
来代替,如下所示:

from spacy.matcher import Matcher
import en_core_web_sm

nlp = en_core_web_sm.load()

matcher = Matcher(nlp.vocab)
pattern = [{'DEP': 'nsubj'}]
matcher.add("depnsubj", None, pattern)

doc = nlp("We're walking around the house.")
matches = matcher(doc)

matched_spans = []
for match_id, start, end in matches:
    span = doc[start:end]
    matched_spans.append(doc[start:end])

matched_tokens = []
with doc.retokenize() as retokenizer:
    for span in spans:
        retokenizer.merge(span)
        for token in span:
            print(token.head)
输出:

walking

我认为你的目标可能是通过一个
能手实现的

import spacy
from spacy.matcher import Matcher
from spacy.tokens import Token
has_dep = lambda token: 'nsubj' in [child.dep_ for child in token.children]
Token.set_extension('HAS_DEP_NSUBJ', getter=has_dep, force=True)

nlp = spacy.load("en_core_web_md")
matcher = Matcher(nlp.vocab)
matcher.add("depnsubj", None, [{"_": {"HAS_DEP_NSUBJ": True}}])

doc = nlp("We're walking around the house.")
matches = matcher(doc)

for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  
    span = doc[start:end]
    print(span)

walking

由于
Matcher
模式没有为扩展提供依赖项标签名称的机制,因此我认为这是最有效的解决方案。