Python 如何使用SPACYNLP查找专有名词_Python_Spacy

Python 如何使用SPACYNLP查找专有名词

python

Python 如何使用SPACYNLP查找专有名词,python,spacy,Python,Spacy,我正在使用spacy构建一个关键字提取器。我要找的关键词是下文中的光学游戏 “该公司也是光学游戏的主要赞助商之一传奇组织的第一次使命召唤锦标赛回到2017年” 如何从本文中解析光学游戏。如果使用名词块，我会得到光学游戏的主要赞助商，如果我得到代币，我会得到[“光学”、“游戏”、“s”] nsubj公司是光学游戏的主要赞助商pobj of 他们第一次打电话给波比职业锦标赛 Spacy为您提取词性（专有名词、行列式、动词等）。您可以使用token.pos 就你而言： import spacy

我正在使用spacy构建一个关键字提取器。我要找的关键词是下文中的

光学游戏

“该公司也是光学游戏的主要赞助商之一传奇组织的第一次使命召唤锦标赛回到2017年”

如何从本文中解析

光学游戏

。如果使用名词块，我会得到光学游戏的主要赞助商，如果我得到代币，我会得到[“光学”、“游戏”、“s”]

nsubj公司是

光学游戏的主要赞助商pobj of

他们第一次打电话给波比

职业锦标赛

Spacy为您提取词性（专有名词、行列式、动词等）。您可以使用

token.pos

就你而言：

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("The company was also one of OpTic Gaming's main sponsors during the legendary organization's run to their first Call of Duty Championship back in 2017")

for tok in doc:
    print(tok, tok.pos_)

一个数字

ADP的作用

光学PROPN

游戏PROPN

然后，您可以过滤专有名词，对连续的专有名词进行分组，并对文档进行切片以获得标称组：

def extract_proper_nouns(doc):
    pos = [tok.i for tok in doc if tok.pos_ == "PROPN"]
    consecutives = []
    current = []
    for elt in pos:
        if len(current) == 0:
            current.append(elt)
        else:
            if current[-1] == elt - 1:
                current.append(elt)
            else:
                consecutives.append(current)
                current = [elt]
    if len(current) != 0:
        consecutives.append(current)
    return [doc[consecutive[0]:consecutive[-1]+1] for consecutive in consecutives]

提取专有名词（doc）

[光学游戏，职业锦标赛]

详情如下：

def extract_proper_nouns(doc):
    pos = [tok.i for tok in doc if tok.pos_ == "PROPN"]
    consecutives = []
    current = []
    for elt in pos:
        if len(current) == 0:
            current.append(elt)
        else:
            if current[-1] == elt - 1:
                current.append(elt)
            else:
                consecutives.append(current)
                current = [elt]
    if len(current) != 0:
        consecutives.append(current)
    return [doc[consecutive[0]:consecutive[-1]+1] for consecutive in consecutives]