Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/342.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 仅基于特定键/值查找重复项_Python_Json_Python 3.x - Fatal编程技术网

Python 仅基于特定键/值查找重复项

Python 仅基于特定键/值查找重复项,python,json,python-3.x,Python,Json,Python 3.x,我试图使用Python标记JSON中重复的对象,只基于“price”和“full address”的键/值,而忽略“url”。然后创建一个新的“重复”键,每个重复键的值为1或2。如何才能最好地做到这一点? 当前: A=[ { "url": "google.com", "price": 550, "full address": "123 sesame st", },

我试图使用Python标记JSON中重复的对象,只基于“price”和“full address”的键/值,而忽略“url”。然后创建一个新的“重复”键,每个重复键的值为1或2。如何才能最好地做到这一点? 当前:

 A=[   {
    "url": "google.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "yahoo.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "bing.com",
    "price": 250,
    "full address": "123 50th st",
}]
 A=[           {
        "url": "google.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 1
    },
        {
        "url": "yahoo.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 2
    },
        {
        "url": "bing.com",
        "price": 250,
        "full address": "123 50th st",
    }]
[{'full address': '123 sesame st', 'duplicate': 1, 'price': 550, 'url': 'google.com'}, {'full address': '123 sesame st', 'duplicate': 2, 'price': 550, 'url': 'yahoo.com'}, {'full address': '123 50th st', 'price': 250, 'url': 'bing.com'}]
预期结果:

 A=[   {
    "url": "google.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "yahoo.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "bing.com",
    "price": 250,
    "full address": "123 50th st",
}]
 A=[           {
        "url": "google.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 1
    },
        {
        "url": "yahoo.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 2
    },
        {
        "url": "bing.com",
        "price": 250,
        "full address": "123 50th st",
    }]
[{'full address': '123 sesame st', 'duplicate': 1, 'price': 550, 'url': 'google.com'}, {'full address': '123 sesame st', 'duplicate': 2, 'price': 550, 'url': 'yahoo.com'}, {'full address': '123 50th st', 'price': 250, 'url': 'bing.com'}]

保留重复项的连续计数,并再次通过以删除任何非重复项的密钥:

from collections import defaultdict

A = [
    {
        "url": "google.com",
        "price": 550,
        "full address": "123 sesame st",
    },
    {
        "url": "yahoo.com",
        "price": 550,
        "full address": "123 sesame st",
    },
    {
        "url": "bing.com",
        "price": 250,
        "full address": "123 50th st",
    },
]

counts = defaultdict(int)

for d in A:
    k = (d["price"], d["full address"])
    counts[k] += 1
    d["duplicate"] = counts[k]

for d in A:
    if counts[(d["price"], d["full address"])] == 1:
        del d["duplicate"]

print(A)

保留重复项的连续计数,并再次通过以删除任何非重复项的密钥:

from collections import defaultdict

A = [
    {
        "url": "google.com",
        "price": 550,
        "full address": "123 sesame st",
    },
    {
        "url": "yahoo.com",
        "price": 550,
        "full address": "123 sesame st",
    },
    {
        "url": "bing.com",
        "price": 250,
        "full address": "123 50th st",
    },
]

counts = defaultdict(int)

for d in A:
    k = (d["price"], d["full address"])
    counts[k] += 1
    d["duplicate"] = counts[k]

for d in A:
    if counts[(d["price"], d["full address"])] == 1:
        del d["duplicate"]

print(A)
优化答案:

 A=[   {
    "url": "google.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "yahoo.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "bing.com",
    "price": 250,
    "full address": "123 50th st",
}]
 A=[           {
        "url": "google.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 1
    },
        {
        "url": "yahoo.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 2
    },
        {
        "url": "bing.com",
        "price": 250,
        "full address": "123 50th st",
    }]
[{'full address': '123 sesame st', 'duplicate': 1, 'price': 550, 'url': 'google.com'}, {'full address': '123 sesame st', 'duplicate': 2, 'price': 550, 'url': 'yahoo.com'}, {'full address': '123 50th st', 'price': 250, 'url': 'bing.com'}]
与第二次通过删除任何非重复项的键不同,仅当存在任何多次出现时才添加
duplicate
键。这样,我们可以只迭代整个字典一次

from collections import defaultdict

A=[   {
    "url": "google.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "yahoo.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "bing.com",
    "price": 250,
    "full address": "123 50th st",
}
]

counts = defaultdict(dict)
for index in range(len(A)):
    d = A[index]
    k = (d["price"], d["full address"])
    counts[k]["count"] = counts[k]["count"] + 1 if counts[k].get("count") else 1
    if counts[k]["count"] == 1:
        counts[k]["first_occurence"] = index
    else:
        A[counts[k]["first_occurence"]]["duplicate"] = 1
        d["duplicate"] = counts[k]["count"]

print(A)
输出:

 A=[   {
    "url": "google.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "yahoo.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "bing.com",
    "price": 250,
    "full address": "123 50th st",
}]
 A=[           {
        "url": "google.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 1
    },
        {
        "url": "yahoo.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 2
    },
        {
        "url": "bing.com",
        "price": 250,
        "full address": "123 50th st",
    }]
[{'full address': '123 sesame st', 'duplicate': 1, 'price': 550, 'url': 'google.com'}, {'full address': '123 sesame st', 'duplicate': 2, 'price': 550, 'url': 'yahoo.com'}, {'full address': '123 50th st', 'price': 250, 'url': 'bing.com'}]
优化答案:

 A=[   {
    "url": "google.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "yahoo.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "bing.com",
    "price": 250,
    "full address": "123 50th st",
}]
 A=[           {
        "url": "google.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 1
    },
        {
        "url": "yahoo.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 2
    },
        {
        "url": "bing.com",
        "price": 250,
        "full address": "123 50th st",
    }]
[{'full address': '123 sesame st', 'duplicate': 1, 'price': 550, 'url': 'google.com'}, {'full address': '123 sesame st', 'duplicate': 2, 'price': 550, 'url': 'yahoo.com'}, {'full address': '123 50th st', 'price': 250, 'url': 'bing.com'}]
与第二次通过删除任何非重复项的键不同,仅当存在任何多次出现时才添加
duplicate
键。这样,我们可以只迭代整个字典一次

from collections import defaultdict

A=[   {
    "url": "google.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "yahoo.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "bing.com",
    "price": 250,
    "full address": "123 50th st",
}
]

counts = defaultdict(dict)
for index in range(len(A)):
    d = A[index]
    k = (d["price"], d["full address"])
    counts[k]["count"] = counts[k]["count"] + 1 if counts[k].get("count") else 1
    if counts[k]["count"] == 1:
        counts[k]["first_occurence"] = index
    else:
        A[counts[k]["first_occurence"]]["duplicate"] = 1
        d["duplicate"] = counts[k]["count"]

print(A)
输出:

 A=[   {
    "url": "google.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "yahoo.com",
    "price": 550,
    "full address": "123 sesame st",
},
    {
    "url": "bing.com",
    "price": 250,
    "full address": "123 50th st",
}]
 A=[           {
        "url": "google.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 1
    },
        {
        "url": "yahoo.com",
        "price": 550,
        "full address": "123 sesame st",
        "duplicate": 2
    },
        {
        "url": "bing.com",
        "price": 250,
        "full address": "123 50th st",
    }]
[{'full address': '123 sesame st', 'duplicate': 1, 'price': 550, 'url': 'google.com'}, {'full address': '123 sesame st', 'duplicate': 2, 'price': 550, 'url': 'yahoo.com'}, {'full address': '123 50th st', 'price': 250, 'url': 'bing.com'}]