Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/292.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在python中连接两个json文件而不是嵌套循环_Python_Json_Performance_Dictionary_Merge - Fatal编程技术网

如何在python中连接两个json文件而不是嵌套循环

如何在python中连接两个json文件而不是嵌套循环,python,json,performance,dictionary,merge,Python,Json,Performance,Dictionary,Merge,每次我从file1获得500条记录,加入file2,其中包含100000条以上的记录,这需要两分钟 with open(file1,'r') as f1,open(file2,'r') as f2: a = json.load(f1) b = json.load(f2) list_a = [] for i in range(len(a)): for n in range(len(b)): if b[n]["id"] == a

每次我从file1获得500条记录,加入file2,其中包含100000条以上的记录,这需要两分钟

with open(file1,'r') as f1,open(file2,'r') as f2:
    a = json.load(f1)
    b = json.load(f2)
    list_a = []
    for i in range(len(a)):
        for n in range(len(b)):
            if b[n]["id"] == a[i]["id"]:
                list_a.append(dict(b[n], **a[i]))
with open(result,'w') as f3:
    json.dump(list_a, f3,sort_keys=True, ensure_ascii=False)
文件1:

[{ "id":"1", "name":"Tom" }, 
{ "id":"2", "name":"Jim" }, 
{ "id":"3", "name":"Bob" }, 
{ "id":"4", "name":"Jeny" },  
{ "id":"5", "name":"Lara" }, 
{ "id":"6", "name":"Lin" }, 
{ "id":"7", "name":"Kim" }, 
{ "id":"8", "name":"Jack" }, 
{ "id":"9", "name":"Tony" }]
文件2:

[ { "id":"1", "Details":[ { "label":"jcc", "hooby":"Swimming" }, { "label":"hkt", "hooby":"Basketball" }, ] }, 
{ "id":"2", "Details":[ { "label":"NTC", "hooby":"Games" } ] } ]
结果:

[ { "id":"1", "name":"Tom", "Details":[ { "label":"jcc", "hooby":"Swimming" }, { "label":"hkt", "hooby":"Basketball" }, ] }, 
{ "id":"2", "name":"Jim", "Details":[ { "label":"NTC", "hooby":"Games" } ] } ] 
您的代码在
O(N*M)
时间内运行(其中
N==len(a)
M==len(b)
),这对于如此大的文件来说太慢了。通过首先为
a
的ID创建映射并使用它查找匹配的
b
的ID,您可以使其在
O(N+M
)时间内运行,例如:

import json

with open('file1') as f1, open('file2') as f2, open('file3', 'w') as f3:
    a = json.load(f1)
    b = json.load(f2)

    aid = {d['id']: d for d in a}
    list_a = [{k: v for d in (b_dict, aid[b_dict['id']]) for k, v in d.items()}
              for b_dict in b if b_dict['id'] in aid]

    json.dump(list_a, f3, sort_keys=True, ensure_ascii=False)

如果希望代码与Python2.x兼容,可以使用字典理解来合并字典(如上图所示)。在Python 3.5+中,您可以简单地使用,例如
{**d1,**d2}

我没有经验知道这是否会加快速度。下面由Eugene Yarmash提供的解决方案似乎更可靠。我也没有大文件来测试速度,但您可以尝试看看使用集合是否会加快迭代。事实上,我自己也很好奇这是否会改变什么:

File1 = [ { "id":"1", "name":"Tom" }, { "id":"2", "name":"Jim" }, { "id":"3", "name":"Bob" }, { "id":"4", "name":"Jeny" }, { "id":"5", "name":"Lara" }, { "id":"6", "name":"Lin" }, { "id":"7", "name":"Kim" }, { "id":"8", "name":"Jack" }, { "id":"9", "name":"Tony" } ]
File2 = [ { "id":"1", "Details":[ { "label":"jcc", "hooby":"Swimming" }, { "label":"hkt", "hooby":"Basketball" }, ] }, { "id":"2", "Details":[ { "label":"NTC", "hooby":"Games" } ] } ] 

from collections import defaultdict

d = defaultdict(dict)
for l in (File1, File2):
    for elem in l:
        d[elem['id']].update(elem)
Result = dict(d)

制作一个或多个以id为键的字典。感谢您的重播,我在python3中有测试scucess,但python2.7错误:列出b中的项目,如果a中的项目[“id”]^SyntaxError:无效语法如何修复它。非常感谢!然后我发现python版本<3.5不支持dicts中的星号表达式,但我需要在python2.7中这样做,可以吗help@PythonBeggar:如果您想与Python 2.x保持兼容,可以使用字典理解而不是解包(请参阅更新)。感谢您的回复,但我们需要与文件1匹配的数据,您提供的结果包括没有