Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python合并重复的json元素_Python_Json_Python 3.x_Merge - Fatal编程技术网

使用Python合并重复的json元素

使用Python合并重复的json元素,python,json,python-3.x,merge,Python,Json,Python 3.x,Merge,我想对光盘上的图像文件执行重复数据消除。我有一个json文件,它描述了成对的副本(来自duplicate image finder的输出)。如果我要配置自动删除规则,因为通常会有两个以上的重复图像,那么我可能会取消所有图像实例的链接。示例json文件如下所示: {"images" : [ {"image1": "./folder1/IMG_013251.jpg", "image2": "./folder3/IMG_013251.jpg", "similarity": 100},

我想对光盘上的图像文件执行重复数据消除。我有一个json文件,它描述了成对的副本(来自duplicate image finder的输出)。如果我要配置自动删除规则,因为通常会有两个以上的重复图像,那么我可能会取消所有图像实例的链接。示例json文件如下所示:

{"images" : [
    {"image1": "./folder1/IMG_013251.jpg", "image2": "./folder3/IMG_013251.jpg", "similarity": 100},
    {"image1": "./folder1/IMG_013251.jpg", "image2": "./folder5/IMG-WA0149.jpg", "similarity": 100},
    {"image1": "./folder1/IMG-WA0149.jpg", "image2": "./folder4/IMG-WA0125.jpg", "similarity": 100},
    {"image1": "./folder5/IMG-WA0149.jpg", "image2": "./folder4/IMG-WA0125.jpg", "similarity": 100},
    {"image1": "./folder2/IMG-WA0149.jpg", "image2": "./folder3/IMG-WA0125.jpg", "similarity": 100},
    {"image1": "./folder3/IMG_045262.jpg", "image2": "./folder8/IMG_013251.jpg", "similarity": 100},
    {"image1": "./folder4/IMG-WA0024.jpg", "image2": "./folder1/IMG-WA0079.jpg", "similarity": 100},
    {"image1": "./folder5/IMG-WA0130.jpg", "image2": "./folder4/IMG-WA0024.jpg", "similarity": 100}]}
我的第一个想法是修改json,使其看起来像这样,但无法计算出逻辑:


{"images" : [
    {"image1": "./folder1/IMG_013251.jpg", "image2": "./folder3/IMG_013251.jpg", "image3": "./folder5/IMG-WA0149.jpg", "similarity": 100},    
    {"image1": "./folder1/IMG-WA0149.jpg", "image2": "./folder4/IMG-WA0125.jpg", "image3": "./folder5/IMG-WA0149.jpg", "similarity": 100},  
    {"image1": "./folder2/IMG-WA0149.jpg", "image2": "./folder3/IMG-WA0125.jpg", "similarity": 100},
    {"image1": "./folder3/IMG_045262.jpg", "image2": "./folder8/IMG_013251.jpg", "similarity": 100},
    {"image1": "./folder4/IMG-WA0024.jpg", "image2": "./folder1/IMG-WA0079.jpg", "image3": "./folder5/IMG-WA0130.jpg", "similarity": 100}]}

我最初的方法是创建两个列表,然后将每个元素与其他元素进行比较,将重复项放入字典中。我尝试了这个,但它没有给我有用的输出。我还研究了dict.update()方法,但我不确定如何首先识别重复的dict。我还能怎么做呢


谢谢,

一种方法是计算等价集

基本上,假设相似关系是可传递的,您将迭代夫妻列表并生成所有等价图片的集合。然后从集合中取出一个实例并取消其他实例的链接

例如,基于您的数据的集合将是:

set1 = {"./folder1/IMG_013251.jpg", "./folder5/IMG-WA0149.jpg", "./folder4/IMG-WA0125.jpg", "./folder3/IMG_045262.jpg", }
set2 = {"./folder4/IMG-WA0024.jpg", "./folder1/IMG-WA0079.jpg", "./folder5/IMG-WA0130.jpg"}
从中,可以选择要保存的实例并取消与其他实例的链接

使用数据布局计算等价集的一种方法是:

set_lists = []
for couple in dict["images"]:
    if couple["similarity"] > thresh:
        img1 = couple["image1"]
        img2 = couple["image2"]
        for eq_set in set_lists:
            if img1 in eq_set:
                eq_set.add(img2)
                break
            elif img2 in eq_set:
                eq_set.add(img1)
                break
         else:
             new_set = set([img1, img2])
             set_lists.append(new_set)