Python 在spacy NER中区分国家和城市

Python 在spacy NER中区分国家和城市,python,spacy,Python,Spacy,我试图使用spacy NER从组织地址中提取国家,但是,它使用相同的标签标记国家和城市GPE。有什么方法可以区分它们吗 例如: nlp = en_core_web_sm.load() doc= nlp('Resilience Engineering Institute, Tempe, AZ, United States; Naval Postgraduate School, Department of Operations Research, Monterey, CA, United Stat

我试图使用spacy NER从组织地址中提取国家,但是,它使用相同的标签标记国家和城市
GPE
。有什么方法可以区分它们吗

例如:

nlp = en_core_web_sm.load()

doc= nlp('Resilience Engineering Institute, Tempe, AZ, United States; Naval Postgraduate School, Department of Operations Research, Monterey, CA, United States; Arizona State University, School of Sustainable Engineering and the Built Environment, Tempe, AZ, United States; Arizona State University, School for the Future of Innovation in Society, Tempe, AZ, United States')

for ent in doc.ents:
    if ent.label_ == 'GPE':
        print(ent.text)
回馈

Tempe
AZ
United States
United States
Tempe
AZ
United States
Tempe
AZ
United States

如前所述,
GPE
实体预测
国家、城市和州
,因此您将无法仅检测具有给定模型的国家实体

我建议只创建一个国家列表,然后检查
GPE
实体是否在此列表中

nlp = en_core_web_sm.load()

doc= nlp('Resilience Engineering Institute, Tempe, AZ, United States; Naval Postgraduate School, Department of Operations Research, Monterey, CA, United States; Arizona State University, School of Sustainable Engineering and the Built Environment, Tempe, AZ, United States; Arizona State University, School for the Future of Innovation in Society, Tempe, AZ, United States')

# create a list of country names that possibly appear in the text
countries = ['US', 'USA', 'United States']

for ent in doc.ents:
    if ent.label_ == 'GPE':
        # check if the value is in the list of countries
        if ent.text in countries:
            print(ent.text, '-- Country')
        else:
            print(ent.text, '-- City or State')
这将输出以下内容:

坦佩——城市还是州

美国--国家

蒙特利——城市还是州

美国--国家

坦佩——城市还是州

美国--国家

美国--国家


正如其他答案所提到的,预培训Spacy模型的GPE适用于国家、城市和州。但是,有一个解决办法,我相信可以使用几种方法

一种方法是:可以向模型添加自定义标记。有一篇很好的文章可以帮助你做到这一点。为此收集培训数据可能会很麻烦,因为您需要在句子中根据城市/国家各自的位置标记它们。我引述以下的答案:

Spacy-NER模型训练包括提取其他“隐含”特征,如词性和周围词

当您尝试对单个单词进行训练时,无法获得足够的通用特征来检测这些实体

一个更简单的解决方法是:

安装

然后使用以下代码获取国家和城市的列表

import geonamescache

gc = geonamescache.GeonamesCache()

# gets nested dictionary for countries
countries = gc.get_countries()

# gets nested dictionary for cities
cities = gc.get_cities()
文档中指出,您还可以获得大量其他位置选项

使用以下函数从嵌套字典(从中获取)获取具有特定名称的键的所有值

分别加载城市和国家的两个列表

cities = [*gen_dict_extract(cities, 'name')]
countries = [*gen_dict_extract(countries, 'name')]
然后使用以下代码进行区分:

nlp = spacy.load("en_core_web_sm")

doc= nlp('Resilience Engineering Institute, Tempe, AZ, United States; Naval Postgraduate School, Department of Operations Research, Monterey, CA, United States; Arizona State University, School of Sustainable Engineering and the Built Environment, Tempe, AZ, United States; Arizona State University, School for the Future of Innovation in Society, Tempe, AZ, United States')

for ent in doc.ents:
    if ent.label_ == 'GPE':
        if ent.text in countries:
            print(f"Country : {ent.text}")
        elif ent.text in cities:
            print(f"City : {ent.text}")
        else:
            print(f"Other GPE : {ent.text}")
输出:

City : Tempe
Other GPE : AZ
Country : United States
Country : United States
City : Tempe
Other GPE : AZ
Country : United States
City : Tempe
Other GPE : AZ
Country : United States

Spacy的文档说明GPE实体类型是针对国家、城市和州的。那么有什么解决方法吗?
nlp = spacy.load("en_core_web_sm")

doc= nlp('Resilience Engineering Institute, Tempe, AZ, United States; Naval Postgraduate School, Department of Operations Research, Monterey, CA, United States; Arizona State University, School of Sustainable Engineering and the Built Environment, Tempe, AZ, United States; Arizona State University, School for the Future of Innovation in Society, Tempe, AZ, United States')

for ent in doc.ents:
    if ent.label_ == 'GPE':
        if ent.text in countries:
            print(f"Country : {ent.text}")
        elif ent.text in cities:
            print(f"City : {ent.text}")
        else:
            print(f"Other GPE : {ent.text}")
City : Tempe
Other GPE : AZ
Country : United States
Country : United States
City : Tempe
Other GPE : AZ
Country : United States
City : Tempe
Other GPE : AZ
Country : United States