Python AttributeError:'spacy.tokens.span.span'对象没有属性'merge'

Python AttributeError:'spacy.tokens.span.span'对象没有属性'merge',python,nlp,spacy,Python,Nlp,Spacy,我正在从事一个nlp项目,并试图遵循本教程 在执行这部分的时候 import spacy # Load the large English NLP model nlp = spacy.load('en_core_web_lg') # Replace a token with "REDACTED" if it is a name def replace_name_with_placeholder(token): if token.ent_iob != 0 and to

我正在从事一个nlp项目,并试图遵循本教程 在执行这部分的时候

import spacy

# Load the large English NLP model
nlp = spacy.load('en_core_web_lg')

# Replace a token with "REDACTED" if it is a name
def replace_name_with_placeholder(token):
   if token.ent_iob != 0 and token.ent_type_ == "PERSON":
    return "[REDACTED] "
  else:
    return token.string

 # Loop through all the entities in a document and check if they are names
def scrub(text):
doc = nlp(text)
for ent in doc.ents:
    ent.merge()
tokens = map(replace_name_with_placeholder, doc)
return "".join(tokens)

s = """
In 1950, Alan Turing published his famous article "Computing Machinery and Intelligence". 
In 1957, Noam Chomsky’s 
 Syntactic Structures revolutionized Linguistics with 'universal grammar', a rule based system of 
 syntactic structures.
 """

 print(scrub(s))
出现此错误

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-62-ab1c786c4914> in <module>
  4 """
  5 
  ----> 6 print(scrub(s))

<ipython-input-60-4742408aa60f> in scrub(text)
  3     doc = nlp(text)
  4     for ent in doc.ents:
  ----> 5         ent.merge()
  6     tokens = map(replace_name_with_placeholder, doc)
  7     return "".join(tokens)

 AttributeError: 'spacy.tokens.span.Span' object has no attribute 'merge'
自从编写了该教程后,Spacy取消了span.merge方法。现在执行此操作的方法是使用doc.retokenize:。我在下面为您的擦洗功能实现了它:

循环遍历文档中的所有实体,并检查它们是否为名称 def scrubtext: doc=nlptext 使用doc.retokenize作为retokenizer: 对于doc.ents中的ent: 复调机 tokens=Map将\u名称\u替换为\u占位符,doc return.jointokes s= 1950年,艾伦·图灵(Alan Turing)发表了他的著名文章《计算机器与智能》。 1957年,诺姆·乔姆斯基 句法结构以“普遍语法”彻底改变了语言学,这是一个基于规则的语法系统 句法结构。 打印Scrubs 其他说明:

您的replace_name_with_placeholder函数将抛出一个错误,请改用token.text,我在下面修复了它:

 def replace_name_with_placeholder(token):
     if token.ent_iob != 0 and token.ent_type_ == "PERSON":
         return "[REDACTED] "
     else:
         return token.text
如果您正在提取实体,另外还有其他跨度,如doc.noun_chunks,则可能会遇到以下问题:

 ValueError: [E102] Can't merge non-disjoint spans. 'Computing' is already part of 
 tokens to merge. If you want to find the longest non-overlapping spans, you can 
 use the util.filter_spans helper:
 https://spacy.io/api/top-level#util.filter_spans
出于这个原因,您可能还需要查看spacy.util.filter_跨度: