Nlp Spacy ent.label无法定义组织_Nlp_Spacy

Nlp Spacy ent.label无法定义组织

nlp

Nlp Spacy ent.label无法定义组织,nlp,spacy,Nlp,Spacy,我用spacy分析恐怖分子，奇怪的是spacy找不到像法塔赫这样的组织。代码如下 import spacy nlp = spacy.load('en') def read_file_to_list(file_name): with open(file_name, 'r') as file: return file.readlines() terrorism_articles = read_file_to_list('data/rand-terrorism-dataset.

我用spacy分析恐怖分子，奇怪的是spacy找不到像法塔赫这样的组织。代码如下

import spacy
nlp = spacy.load('en')
def read_file_to_list(file_name):
    with open(file_name, 'r') as file:
        return file.readlines()
terrorism_articles = read_file_to_list('data/rand-terrorism-dataset.txt')
terrorism_articles_nlp = [nlp(art) for art in terrorism_articles]
common_terrorist_groups = [
    'taliban', 
    'al - qaeda', 
    'hamas',  
    'fatah', 
    'plo', 
    'bilad al - rafidayn'
]

common_locations = [
    'iraq',
    'baghdad', 
    'kirkuk', 
    'mosul', 
    'afghanistan', 
    'kabul',
    'basra', 
    'palestine', 
    'gaza', 
    'israel', 
    'istanbul', 
    'beirut', 
    'pakistan'
]
location_entity_dict = defaultdict(Counter)

for article in terrorism_articles_nlp:
    
    article_terrorist_groups = [ent.lemma_ for ent in article.ents if ent.label_=='PERSON' or ent.label_ =='ORG']#人或者组织
    article_locations = [ent.lemma_ for ent in article.ents if ent.label_=='GPE']
    terrorist_common = [ent for ent in article_terrorist_groups if ent in common_terrorist_groups]
    locations_common = [ent for ent in article_locations if ent in common_locations]
    
    for found_entity in terrorist_common:
        for found_location in locations_common:
            location_entity_dict[found_entity][found_location] += 1
location_entity_dict

我只是从文件中什么也没有得到。这是

谢谢大家!

我复制了您的示例，看起来您将获得

文章恐怖组织

和

恐怖组织共同

的空列表。因此，您将无法获得所需的输出（我假定）。我将（我的机器）的型号更改为

en_core\u web\u sm

，并观察到

ent.label

与列表中的

if

语句中指定的标签不同。我几乎可以肯定，无论是使用

spacy.load（'en'）

还是

spacy.load（'en\u core\u web\u sm'）

，情况都是如此

您使用的是

if ent.label=='PERSON'或ent.label=='ORG'

，这将导致空列表。您需要更改此项才能使其正常工作。基本上，在您对

文章恐怖组织

和

恐怖组织共同

的列表理解中，for循环试图遍历空列表

如果您查看我发布的输出，您将看到

ent.label

不是

'PERSON'

或

'ORG'

注意：我建议在代码中添加print语句（或使用调试器）以便不时进行检查

我的代码

import spacy
from collections import defaultdict, Counter
nlp = spacy.load('en_core_web_sm') # I changed this
def read_file_to_list(file_name):
    with open(file_name, 'r') as file:
        return file.readlines()

terrorism_articles = read_file_to_list('rand-terrorism-dataset.txt')
terrorism_articles_nlp = [nlp(art) for art in terrorism_articles]
common_terrorist_groups = [
    'taliban', 
    'al - qaeda', 
    'hamas',  
    'fatah', 
    'plo', 
    'bilad al - rafidayn'
]

common_locations = [
    'iraq',
    'baghdad', 
    'kirkuk', 
    'mosul', 
    'afghanistan', 
    'kabul',
    'basra', 
    'palestine', 
    'gaza', 
    'israel', 
    'istanbul', 
    'beirut', 
    'pakistan'
]
location_entity_dict = defaultdict(Counter)


for article in terrorism_articles_nlp:
    print([(ent.lemma_, ent.label) for ent in article.ents])

输出

[('CHILE', 383), ('the Santiago Binational Center', 383), ('21,000', 394)]
[('ISRAEL', 384), ('palestinian', 381), ('five', 397), ('Masada', 384)]
[('GUATEMALA', 383), ('U.S. Marines', 381), ('Guatemala City', 384)]

出于对该答案长度的考虑，截断了输出

，因为

普通

和

普通

中的组和位置是小写的，而查找到的数据

普通

和

普通

是大写的。因此，只需将代码

if-ent在普通恐怖组织中更改为if-ent.lower（）在普通恐怖组织中

普通恐怖组织=[
“塔利班”，
"基地",，
“哈马斯”，
“法塔赫”，
"巴解组织",，
“bilad al-rafidayn”
]
公共_位置=[
“伊拉克”，
“巴格达”，
“基尔库克”，
“摩苏尔”，
“阿富汗”，
“喀布尔”，
“巴士拉”，
“巴勒斯坦”，
“加沙”，
“以色列”，
“伊斯坦布尔”，
“贝鲁特”，
“巴基斯坦”
]
位置\实体\目录=默认目录（计数器）
关于恐怖主义的文章(nlp):
article_terrorist_cands=[ent.lemma=如果ent.label=='PERSON'或ent.label=='ORG'，则在article.ents中表示ent
article_location_cands=[ent.lemma uu如果ent.label=='GPE']
恐怖分子候选人=[ent for ent in article_terrorist_cands if ent.lower（）中的ent for ent in common_terrorist_Group]
位置候选=[loc for loc in article_location_cands if loc.lower（）in common_位置]
对于恐怖分子候选人中发现的实体：
对于在位置候选项中找到的位置：
位置\u实体\u目录[找到的\u实体][找到的\u位置]+=1
谢谢您的回答！但是，如果您在article.ents if ent.label中检查“article_locations=[ent.lemma]for ent”一行，您将只得到“['Paris'、'Porto Vecchio'、'Corsica']”。这里的GPE指的是国家、城市和州。spacy不能将智利定义为GPE，我不明白。。