Python 合并并创建具有相同id的所有记录的新JSON数组
我必须合并并创建一个JSON数组,其中包含字典列表中具有相同集群id的所有记录。例如:id:1和2具有相同的cluster\u id字段,因此它们应该按照预期的输出进行合并,对于新字段记录,3个字段id、name、match\u full\u address应该显示为JSON数组,对于id为3的单例记录,它们应该显示为JSON数组 我的字典列表:Python 合并并创建具有相同id的所有记录的新JSON数组,python,json,python-3.x,Python,Json,Python 3.x,我必须合并并创建一个JSON数组,其中包含字典列表中具有相同集群id的所有记录。例如:id:1和2具有相同的cluster\u id字段,因此它们应该按照预期的输出进行合并,对于新字段记录,3个字段id、name、match\u full\u address应该显示为JSON数组,对于id为3的单例记录,它们应该显示为JSON数组 我的字典列表: [{ 'id': 1, 'name': 'Will Smith', 'match_full_address': 'Ridge
[{
'id': 1,
'name': 'Will Smith',
'match_full_address': 'Ridge Boulevard,123 Main Street,Branchburg,NJ',
'cluster_id': 91,
'lat': 18756.73,
'longi': -97.395351,
},
{
'id': 2,
'name': 'Sandra Bullock',
'match_full_address': 'New Castle,123 Mountain Ave,Branchburg,NJ',
'cluster_id': 91,
'lat': 18756.73,
'longi': -97.395351,
},
{
'id': 3,
'name': 'Tom Cruise',
'match_full_address': 'MI2, 123 Syracuse Avenue, Branchburg,NJ',
'cluster_id': 92,
'lat': 18756.73,
'longi': -97.395351,
}
]
预期产出:
[{
'cluster_id': 91,
'lat': 18756.73,
'longi': -97.395351,
'records': [{'id': 1,
'name': 'Will Smith',
'match_full_address': 'Ridge Boulevard,123 Main Street,Branchburg,NJ'},
{'id': 2,
'name': 'Sandra Bullock',
'match_full_address': 'New Castle,123 Mountain Ave,Branchburg,NJ'}]
},
{
'cluster_id': 92,
'lat': 18756.73,
'longi': -97.395351,
'records': [{ 'id': 3,
'name': 'Tom Cruise',
'match_full_address': 'MI2, 123 Syracuse Avenue, Branchburg,NJ'}
}
]
虽然你仍然可以使用理解,但我认为这不是一个很好的例子。所以只要简单地重复一下你的清单
#!/usr/bin/env python3
import json
listM = [{
'id': 1,
'name': 'Will Smith',
'match_full_address': 'Ridge Boulevard,123 Main Street,Branchburg,NJ',
'cluster_id': 91,
'lat': 18756.73,
'longi': -97.395351,
},
{
'id': 2,
'name': 'Sandra Bullock',
'match_full_address': 'New Castle,123 Mountain Ave,Branchburg,NJ',
'cluster_id': 91,
'lat': 18756.73,
'longi': -97.395351,
},
{
'id': 3,
'name': 'Tom Cruise',
'match_full_address': 'MI2, 123 Syracuse Avenue, Branchburg,NJ',
'cluster_id': 92,
'lat': 18756.73,
'longi': -97.395351,
}
]
clusters = dict()
for item in listM:
data = clusters.get(item['cluster_id'], {})
if len(data) == 0:
data["cluster_id"] = item["cluster_id"]
data["lat"] = item["lat"]
data["long"] = item["longi"]
data["records"] = []
data["records"].append(
dict({
'id': item['id'],
'name': item['name'],
'match_full_address': item['match_full_address']
})
)
clusters.update({ item['cluster_id']: data })
print(list(clusters.values()))
这类问题很常见。答案总是:
排序
+:
上述解决方案处理的情况是,集群中的所有元素的属性(如
lat
)可能不相同。在这种情况下,它会自动将lat
插入记录
数组中,而不是在集群级别。此外,如果所有记录共用相同的值,则将其置于记录
之外
我将把它当作一个练习来调整它以获得您想要的输出。您可以使用临时dict来跟踪相同
集群id的记录,并将感兴趣的键添加到记录中
假设您的DICT列表存储在变量l
中:
t = {}
for d in l:
if d['cluster_id'] not in t:
t[d['cluster_id']] = {k: d.get(k, []) for k in ('cluster_id', 'lat', 'longi', 'records')}
t[d['cluster_id']]['records'].append({k: d[k] for k in ('id', 'name', 'match_full_address')})
列表(t.values())
将返回:
[{'cluster_id': 91,
'lat': 18756.73,
'longi': -97.395351,
'records': [{'id': 1,
'match_full_address': 'Ridge Boulevard,123 Main '
'Street,Branchburg,NJ',
'name': 'Will Smith'},
{'id': 2,
'match_full_address': 'New Castle,123 Mountain '
'Ave,Branchburg,NJ',
'name': 'Sandra Bullock'}]},
{'cluster_id': 92,
'lat': 18756.73,
'longi': -97.395351,
'records': [{'id': 3,
'match_full_address': 'MI2, 123 Syracuse Avenue, Branchburg,NJ',
'name': 'Tom Cruise'}]}]
[{'cluster_id': 91,
'lat': 18756.73,
'longi': -97.395351,
'records': [{'id': 1,
'match_full_address': 'Ridge Boulevard,123 Main '
'Street,Branchburg,NJ',
'name': 'Will Smith'},
{'id': 2,
'match_full_address': 'New Castle,123 Mountain '
'Ave,Branchburg,NJ',
'name': 'Sandra Bullock'}]},
{'cluster_id': 92,
'lat': 18756.73,
'longi': -97.395351,
'records': [{'id': 3,
'match_full_address': 'MI2, 123 Syracuse Avenue, Branchburg,NJ',
'name': 'Tom Cruise'}]}]