Python—在字典列表中查找重复项并将其分组_Python_Json_List_Dictionary

Python—在字典列表中查找重复项并将其分组

python json list dictionary

Python—在字典列表中查找重复项并将其分组,python,json,list,dictionary,Python,Json,List,Dictionary,我不是程序员，也是python新手，我有一个来自json文件的DICT列表： # JSON file (film.json) [{"year": ["1999"], "director": ["Wachowski"], "film": ["The Matrix"], "price": ["19,00"]}, {"year": ["1994"], "director": ["Tarantino"], "film": ["Pulp Fiction"], "price": ["20,00"]}, {"

我不是程序员，也是python新手，我有一个来自json文件的DICT列表：

# JSON file (film.json)
[{"year": ["1999"], "director": ["Wachowski"], "film": ["The Matrix"], "price": ["19,00"]},
{"year": ["1994"], "director": ["Tarantino"], "film": ["Pulp Fiction"], "price": ["20,00"]},
{"year": ["2003"], "director": ["Tarantino"], "film": ["Kill Bill vol.1"], "price": ["10,00"]},
{"year": ["2003"], "director": ["Wachowski"], "film": ["The Matrix Reloaded"], "price": ["9,99"]},
{"year": ["1994"], "director": ["Tarantino"], "film": ["Pulp Fyction"], "price": ["15,00"]},
{"year": ["1994"], "director": ["E. de Souza"], "film": ["Street Fighter"], "price": ["2,00"]},
{"year": ["1999"], "director": ["Wachowski"], "film": ["The Matrix"], "price": ["20,00"]},
{"year": ["1982"], "director": ["Ridley Scott"], "film": ["Blade Runner"], "price": ["19,99"]}]

我可以通过以下方式导入json文件：

import json
json_file = open('film.json')
f = json.load(json_file)

但在那之后，我无法在

中找到事件并按电影标题分组。这就是我想要实现的目标：

## result grouped by 'film'
#group 1
{"year": ["1999"], "director": ["Wachowski"], "film": ["The Matrix"], "price": ["19,00"]}
{"year": ["1999"], "director": ["Wachowski"], "film": ["The Matrix"], "price": ["20,00"]}
#group 2
{"year": ["1994"], "director": ["Tarantino"], "film": ["Pulp Fiction"], "price": ["20,00"]}
{"year": ["1994"], "director": ["Tarantino"], "film": ["Pulp Fyction"], "price": ["15,00"]}
#group X
 ...

或者更好：

new_dict = { 'group1':[[],[],...] , 'group2':[[],[],...] , 'groupX':[...] }

目前，我正在使用嵌套的

进行测试，但是运气不好
多谢各位
注意：“pulp fyction”是一个通缉犯，用于将来实现模糊字符串匹配，现在我只需要一个“duplicates grouper”
注2：对于Python2.x，由于数据未排序，请使用a具体化新键列表，然后按电影标题设置键：
from collections import defaultdict

grouped = defaultdict(list)

for film in f:
    grouped[film['film'][0]].append(film)

胶片['film'][0]
值用于对胶片进行分组。如果您想使用更复杂的标题分组，就必须创建该键的规范版本
演示：
如果是一次性的，而且我很匆忙，我会这样做。在本例中，假设您的字典列表是lod，并且电影标题将永远是一个包含一个项目的列表
new_dict = {k:[d for d in lod if d.get('film')[0] == k] for k in set(d.get('film')[0] for d in l)}

为了使它更具可读性，并解释它在做什么，同样的事情也出现了，字典列表也是lod：
#get all the unique film names
# note: the [0] is because its a list for the title, and set doesn't work with lists,
#so we're just taking the first one for this example. 
films = set(d.get('film')[0] for d in lod)


#create a dictionary
new_dict = {}

#iterate over the unique film names
for k in films:
    #make a list of all the films that match the name we're on
    filmswiththisname = [d for d in lod if d.get('film')[0] == k]
    #add the list of films to the new dictionary with the film name as the key.
    new_dict[k] = filmswiththisname

你在做什么？只有头衔？Title+director+year？为什么不按film命名您的组？@wim根据'film'键中的值对整个dict行（Title、director、year、price）进行分组。是的，只有头衔。
>>> pprint(dict(grouped_by_soundex))
{u'B436': [{u'director': [u'Ridley Scott'],
            u'film': [u'Blade Runner'],
            u'price': [u'19,99'],
            u'year': [u'1982']}],
 u'K414': [{u'director': [u'Tarantino'],
            u'film': [u'Kill Bill vol.1'],
            u'price': [u'10,00'],
            u'year': [u'2003']}],
 u'P412': [{u'director': [u'Tarantino'],
            u'film': [u'Pulp Fiction'],
            u'price': [u'20,00'],
            u'year': [u'1994']},
           {u'director': [u'Tarantino'],
            u'film': [u'Pulp Fyction'],
            u'price': [u'15,00'],
            u'year': [u'1994']}],
 u'S363': [{u'director': [u'E. de Souza'],
            u'film': [u'Street Fighter'],
            u'price': [u'2,00'],
            u'year': [u'1994']}],
 u'T536': [{u'director': [u'Wachowski'],
            u'film': [u'The Matrix'],
            u'price': [u'19,00'],
            u'year': [u'1999']},
           {u'director': [u'Wachowski'],
            u'film': [u'The Matrix Reloaded'],
            u'price': [u'9,99'],
            u'year': [u'2003']},
           {u'director': [u'Wachowski'],
            u'film': [u'The Matrix'],
            u'price': [u'20,00'],
            u'year': [u'1999']}]}

new_dict = {k:[d for d in lod if d.get('film')[0] == k] for k in set(d.get('film')[0] for d in l)}

#get all the unique film names
# note: the [0] is because its a list for the title, and set doesn't work with lists,
#so we're just taking the first one for this example. 
films = set(d.get('film')[0] for d in lod)


#create a dictionary
new_dict = {}

#iterate over the unique film names
for k in films:
    #make a list of all the films that match the name we're on
    filmswiththisname = [d for d in lod if d.get('film')[0] == k]
    #add the list of films to the new dictionary with the film name as the key.
    new_dict[k] = filmswiththisname