Python AttributeError:'spacy.tokens.span.span'对象没有属性'merge'_Python_Nlp_Spacy

Python AttributeError:'spacy.tokens.span.span'对象没有属性'merge'

python nlp

Python AttributeError:'spacy.tokens.span.span'对象没有属性'merge',python,nlp,spacy,Python,Nlp,Spacy,我正在从事一个nlp项目，并试图遵循本教程在执行这部分的时候 import spacy # Load the large English NLP model nlp = spacy.load('en_core_web_lg') # Replace a token with "REDACTED" if it is a name def replace_name_with_placeholder(token): if token.ent_iob != 0 and to

我正在从事一个nlp项目，并试图遵循本教程在执行这部分的时候

import spacy

# Load the large English NLP model
nlp = spacy.load('en_core_web_lg')

# Replace a token with "REDACTED" if it is a name
def replace_name_with_placeholder(token):
   if token.ent_iob != 0 and token.ent_type_ == "PERSON":
    return "[REDACTED] "
  else:
    return token.string

 # Loop through all the entities in a document and check if they are names
def scrub(text):
doc = nlp(text)
for ent in doc.ents:
    ent.merge()
tokens = map(replace_name_with_placeholder, doc)
return "".join(tokens)

s = """
In 1950, Alan Turing published his famous article "Computing Machinery and Intelligence". 
In 1957, Noam Chomsky’s 
 Syntactic Structures revolutionized Linguistics with 'universal grammar', a rule based system of 
 syntactic structures.
 """

 print(scrub(s))

出现此错误

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-62-ab1c786c4914> in <module>
  4 """
  5 
  ----> 6 print(scrub(s))

<ipython-input-60-4742408aa60f> in scrub(text)
  3     doc = nlp(text)
  4     for ent in doc.ents:
  ----> 5         ent.merge()
  6     tokens = map(replace_name_with_placeholder, doc)
  7     return "".join(tokens)

 AttributeError: 'spacy.tokens.span.Span' object has no attribute 'merge'

自从编写了该教程后，Spacy取消了span.merge方法。现在执行此操作的方法是使用doc.retokenize:。我在下面为您的擦洗功能实现了它：

循环遍历文档中的所有实体，并检查它们是否为名称 def scrubtext： doc=nlptext 使用doc.retokenize作为retokenizer：对于doc.ents中的ent：复调机 tokens=Map将\u名称\u替换为\u占位符，doc return.jointokes s= 1950年，艾伦·图灵（Alan Turing）发表了他的著名文章《计算机器与智能》。 1957年，诺姆·乔姆斯基句法结构以“普遍语法”彻底改变了语言学，这是一个基于规则的语法系统句法结构。打印Scrubs 其他说明：

您的replace_name_with_placeholder函数将抛出一个错误，请改用token.text，我在下面修复了它：

 def replace_name_with_placeholder(token):
     if token.ent_iob != 0 and token.ent_type_ == "PERSON":
         return "[REDACTED] "
     else:
         return token.text

如果您正在提取实体，另外还有其他跨度，如doc.noun_chunks，则可能会遇到以下问题：

 ValueError: [E102] Can't merge non-disjoint spans. 'Computing' is already part of 
 tokens to merge. If you want to find the longest non-overlapping spans, you can 
 use the util.filter_spans helper:
 https://spacy.io/api/top-level#util.filter_spans

出于这个原因，您可能还需要查看spacy.util.filter_跨度：