Python AttributeError:'spacy.tokens.span.span'对象没有属性'merge'
我正在从事一个nlp项目,并试图遵循本教程 在执行这部分的时候Python AttributeError:'spacy.tokens.span.span'对象没有属性'merge',python,nlp,spacy,Python,Nlp,Spacy,我正在从事一个nlp项目,并试图遵循本教程 在执行这部分的时候 import spacy # Load the large English NLP model nlp = spacy.load('en_core_web_lg') # Replace a token with "REDACTED" if it is a name def replace_name_with_placeholder(token): if token.ent_iob != 0 and to
import spacy
# Load the large English NLP model
nlp = spacy.load('en_core_web_lg')
# Replace a token with "REDACTED" if it is a name
def replace_name_with_placeholder(token):
if token.ent_iob != 0 and token.ent_type_ == "PERSON":
return "[REDACTED] "
else:
return token.string
# Loop through all the entities in a document and check if they are names
def scrub(text):
doc = nlp(text)
for ent in doc.ents:
ent.merge()
tokens = map(replace_name_with_placeholder, doc)
return "".join(tokens)
s = """
In 1950, Alan Turing published his famous article "Computing Machinery and Intelligence".
In 1957, Noam Chomsky’s
Syntactic Structures revolutionized Linguistics with 'universal grammar', a rule based system of
syntactic structures.
"""
print(scrub(s))
出现此错误
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-62-ab1c786c4914> in <module>
4 """
5
----> 6 print(scrub(s))
<ipython-input-60-4742408aa60f> in scrub(text)
3 doc = nlp(text)
4 for ent in doc.ents:
----> 5 ent.merge()
6 tokens = map(replace_name_with_placeholder, doc)
7 return "".join(tokens)
AttributeError: 'spacy.tokens.span.Span' object has no attribute 'merge'
自从编写了该教程后,Spacy取消了span.merge方法。现在执行此操作的方法是使用doc.retokenize:。我在下面为您的擦洗功能实现了它:
循环遍历文档中的所有实体,并检查它们是否为名称
def scrubtext:
doc=nlptext
使用doc.retokenize作为retokenizer:
对于doc.ents中的ent:
复调机
tokens=Map将\u名称\u替换为\u占位符,doc
return.jointokes
s=
1950年,艾伦·图灵(Alan Turing)发表了他的著名文章《计算机器与智能》。
1957年,诺姆·乔姆斯基
句法结构以“普遍语法”彻底改变了语言学,这是一个基于规则的语法系统
句法结构。
打印Scrubs
其他说明:
您的replace_name_with_placeholder函数将抛出一个错误,请改用token.text,我在下面修复了它:
def replace_name_with_placeholder(token):
if token.ent_iob != 0 and token.ent_type_ == "PERSON":
return "[REDACTED] "
else:
return token.text
如果您正在提取实体,另外还有其他跨度,如doc.noun_chunks,则可能会遇到以下问题:
ValueError: [E102] Can't merge non-disjoint spans. 'Computing' is already part of
tokens to merge. If you want to find the longest non-overlapping spans, you can
use the util.filter_spans helper:
https://spacy.io/api/top-level#util.filter_spans
出于这个原因,您可能还需要查看spacy.util.filter_跨度: