Python 如何将实体（列表）转换为字典？我尝试过的代码被注释，无法工作，NLP问题_Python_Nlp_Nltk_Spacy

Python 如何将实体（列表）转换为字典？我尝试过的代码被注释，无法工作，NLP问题

python nlp

Python 如何将实体（列表）转换为字典？我尝试过的代码被注释，无法工作，NLP问题,python,nlp,nltk,spacy,Python,Nlp,Nltk,Spacy,如何将实体（列表）转换为字典？我尝试过的代码被注释而不起作用，或者我如何将实体重写为字典？我想在字典中转换，以便在前500个句子中找到5个最常被命名的人 ! pip install wget import wget url = 'https://raw.githubusercontent.com/dirkhovy/NLPclass/master/data/moby_dick.txt' wget.download(url, 'moby_dick.txt') documents = [line.s

如何将实体（列表）转换为字典？我尝试过的代码被注释而不起作用，或者我如何将实体重写为字典？我想在字典中转换，以便在前500个句子中找到5个最常被命名的人

! pip install wget
import wget
url = 'https://raw.githubusercontent.com/dirkhovy/NLPclass/master/data/moby_dick.txt'
wget.download(url, 'moby_dick.txt')
documents = [line.strip() for line in open('moby_dick.txt', encoding='utf8').readlines()]

import spacy

nlp = spacy.load('en')
entities = [[(entity.text, entity.label_) for entity in nlp(sentence).ents]for sentence in documents[:50]]
entities


#I TRIED THIS BUT IS WRONG
#def Convert(lst): 
#    res_dct = {lst[i]: lst[i + 1] for i in range(0, len(lst), 2)} 
#    return res_dct
#print(Convert(ent))

存储在变量

entities

中的列表具有类型

list[list[tuple[str，str]]]

，其中tuple中的第一个条目是实体的字符串，第二个条目是实体的类型，例如：

>>> from pprint import pprint
>>> pprint(entities)
[[],
 [('Ishmael', 'GPE')],
 [('Some years ago', 'DATE')],
 [],
 [('November', 'DATE')],
 [],
 [('Cato', 'ORG')],
 [],
 [],
 [('Manhattoes', 'ORG'), ('Indian', 'NORP')],
 [],
 [('a few hours', 'TIME')],
...

然后，您可以按以下方式创建一个反向

dict

：

>>> sum(filter(None, entities), [])
[('Ishmael', 'GPE'), ('Some years ago', 'DATE'), ('November', 'DATE'), ('Cato', 'ORG'), ('Manhattoes', 'ORG'), ('Indian', 'NORP'), ('a few hours', 'TIME'), ('Sabbath afternoon', 'TIME'), ('Corlears Hook to Coenties Slip', 'WORK_OF_ART'), ('Whitehall', 'PERSON'), ('thousands upon thousands', 'CARDINAL'), ('China', 'GPE'), ('week days', 'DATE'), ('ten', 'CARDINAL'), ('American', 'NORP'), ('June', 'DATE'), ('one', 'CARDINAL'), ('Niagara', 'ORG'), ('thousand miles', 'QUANTITY'), ('Tennessee', 'GPE'), ('two', 'CARDINAL'), ('Rockaway Beach', 'GPE'), ('first', 'ORDINAL'), ('first', 'ORDINAL'), ('Persians', 'NORP')]
>>> from collections import defaultdict
>>> type2entities = defaultdict(list)
>>> for entity, entity_type in sum(filter(None, entities), []):
...   type2entities[entity_type].append(entity)
...
>>> from pprint import pprint
>>> pprint(type2entities)
defaultdict(<class 'list'>,
            {'CARDINAL': ['thousands upon thousands', 'ten', 'one', 'two'],
             'DATE': ['Some years ago', 'November', 'week days', 'June'],
             'GPE': ['Ishmael', 'China', 'Tennessee', 'Rockaway Beach'],
             'NORP': ['Indian', 'American', 'Persians'],
             'ORDINAL': ['first', 'first'],
             'ORG': ['Cato', 'Manhattoes', 'Niagara'],
             'PERSON': ['Whitehall'],
             'QUANTITY': ['thousand miles'],
             'TIME': ['a few hours', 'Sabbath afternoon'],
             'WORK_OF_ART': ['Corlears Hook to Coenties Slip']})

非常感谢，你知道从这本反向词典中，我如何在前500句中找到5个最常被提及的人吗。我认为smth是这样的``track={}键，值在ent.items（）中：如果值不在track中：track[value]=0，否则：track[value]+=1打印（max（track，key=track.get））``@elham我已经更新了答案。如果你认为这有帮助，你能投票决定答案并接受吗？谢谢

>>> from collections import Counter
>>> entities = [[(entity.text, entity.label_) for entity in nlp(sentence).ents]for sentence in documents[:500]]
>>> person_cnt = Counter()
>>> for entity, entity_type in sum(filter(None, entities), []):
...   if entity_type == 'PERSON':
...     person_cnt[entity] += 1
...
>>> person_cnt.most_common(5)
[('Queequeg', 17), ('don', 4), ('Nantucket', 2), ('Jonah', 2), ('Sal', 2)]