Python 基于键值聚合列表中的dict_Python_List_Dictionary

Python 基于键值聚合列表中的dict

python list dictionary

Python 基于键值聚合列表中的dict,python,list,dictionary,Python,List,Dictionary,我正努力把我的头绕在这个上面。我有一个包含多个字典的列表，我想基于两个值聚合这些字典。示例代码： >>> data = [ ... { "regex": ".*ccc-r.*", "age": 44, "count": 224 }, ... { "regex": ".*nft-r.*", "age": 23, "c

我正努力把我的头绕在这个上面。我有一个包含多个字典的列表，我想基于两个值聚合这些字典。示例代码：

>>> data = [
...     { "regex": ".*ccc-r.*", "age": 44, "count": 224 },
...     { "regex": ".*nft-r.*", "age": 23, "count": 44 },
...     { "regex": ".*ccc-r.*", "age": 44, "count": 20 },
...     { "regex": ".*ccc-r.*", "age": 32, "count": 16 },
...     { "regex": ".*nft-r.*", "age": 23, "count": 46 },
...     { "regex": ".*zxy-r.*", "age": 16, "count": 55 }
...     ]

我试图聚合具有相同年龄和regex的dict，并在所有实例中添加count键。示例输出为：

>>> data = [
...     { "regex": ".*ccc-r.*", "age": 44, "count": 244 },
...     { "regex": ".*nft-r.*", "age": 23, "count": 90 },
...     { "regex": ".*ccc-r.*", "age": 32, "count": 16 },
...     { "regex": ".*zxy-r.*", "age": 16, "count": 55 }
...     ]

如果您希望在没有熊猫或插件模块的情况下实现这一点，如果可能的话，希望使用std库中的解决方案

谢谢

您可以使用集合。defaultdict：

from collections import defaultdict
d = defaultdict(int)
data = [{'regex': '.*ccc-r.*', 'age': 44, 'count': 224}, {'regex': '.*nft-r.*', 'age': 23, 'count': 44}, {'regex': '.*ccc-r.*', 'age': 44, 'count': 20}, {'regex': '.*ccc-r.*', 'age': 32, 'count': 16}, {'regex': '.*nft-r.*', 'age': 23, 'count': 46}, {'regex': '.*zxy-r.*', 'age': 16, 'count': 55}]
for i in data:
   d[(i['regex'], i['age'])] += i['count']

r = [{'regex':a, 'age':b, 'count':c} for (a, b), c in d.items()]

输出：

[{'regex': '.*ccc-r.*', 'age': 44, 'count': 244}, 
 {'regex': '.*nft-r.*', 'age': 23, 'count': 90}, 
 {'regex': '.*ccc-r.*', 'age': 32, 'count': 16}, 
 {'regex': '.*zxy-r.*', 'age': 16, 'count': 55}]

您可以使用集合。defaultdict：

from collections import defaultdict
d = defaultdict(int)
data = [{'regex': '.*ccc-r.*', 'age': 44, 'count': 224}, {'regex': '.*nft-r.*', 'age': 23, 'count': 44}, {'regex': '.*ccc-r.*', 'age': 44, 'count': 20}, {'regex': '.*ccc-r.*', 'age': 32, 'count': 16}, {'regex': '.*nft-r.*', 'age': 23, 'count': 46}, {'regex': '.*zxy-r.*', 'age': 16, 'count': 55}]
for i in data:
   d[(i['regex'], i['age'])] += i['count']

r = [{'regex':a, 'age':b, 'count':c} for (a, b), c in d.items()]

输出：

[{'regex': '.*ccc-r.*', 'age': 44, 'count': 244}, 
 {'regex': '.*nft-r.*', 'age': 23, 'count': 90}, 
 {'regex': '.*ccc-r.*', 'age': 32, 'count': 16}, 
 {'regex': '.*zxy-r.*', 'age': 16, 'count': 55}]

假设您不想使用任何导入，您可以首先在字典

聚合_data

中收集数据，其中键将是

（regex，age）

的元组，值将是

计数。形成本词典后，您就可以恢复原来的结构：
数据=[
{“regex”：“*ccc-r.*”，“年龄”：44，“计数”：224}，
{“regex”：“*nft-r.*”，“年龄”：23，“计数”：44}，
{“regex”：“*ccc-r.*”，“年龄”：44，“计数”：20}，
{“regex”：“*ccc-r.*”，“年龄”：32，“计数”：16}，
{“regex”：“*nft-r.*”，“年龄”：23，“计数”：46}，
{“regex”：“*zxy-r.*，“年龄”：16，“计数”：55}
]
聚合的_数据={}
对于数据中的字典：
键=（字典['regex']，字典['age']）
聚合的_数据[key]=聚合的_数据。get（key，0）+字典['count']
数据=[{'regex'：键[0]，'age'：键[1]，'count'：值}对于键，聚合的_data.items（）中的值
假设您不想使用任何导入，您可以首先在字典聚合\u data
中收集数据，其中键将是（regex，age）
的元组，值将是计数。形成本词典后，您就可以恢复原来的结构：
数据=[
{“regex”：“*ccc-r.*”，“年龄”：44，“计数”：224}，
{“regex”：“*nft-r.*”，“年龄”：23，“计数”：44}，
{“regex”：“*ccc-r.*”，“年龄”：44，“计数”：20}，
{“regex”：“*ccc-r.*”，“年龄”：32，“计数”：16}，
{“regex”：“*nft-r.*”，“年龄”：23，“计数”：46}，
{“regex”：“*zxy-r.*，“年龄”：16，“计数”：55}
]
聚合的_数据={}
对于数据中的字典：
键=（字典['regex']，字典['age']）
聚合的_数据[key]=聚合的_数据。get（key，0）+字典['count']
数据=[{'regex'：键[0]，'age'：键[1]，'count'：值}对于键，聚合的_data.items（）中的值
您也可以尝试
agg = {}

for d in data:
    if agg.get(d['regex']):
        agg[d['regex']]['count'] += d['count']
    else:
        agg[d['regex']] = d

print(agg.values())

你也可以试试
agg = {}

for d in data:
    if agg.get(d['regex']):
        agg[d['regex']]['count'] += d['count']
    else:
        agg[d['regex']] = d

print(agg.values())

如果您不反对使用库（以及稍微不同的输出），那么可以使用pandas

将熊猫作为pd导入
df=pd.DataFrame（数据）
data.groupby（['regex'，'age']）.sum（）

这就产生了
               count
regex     age
.*ccc-r.* 32      16
          44     244
.*nft-r.* 23      90
.*zxy-r.* 16      55

如果您不反对使用库（以及稍微不同的输出），那么可以使用pandas

将熊猫作为pd导入
df=pd.DataFrame（数据）
data.groupby（['regex'，'age']）.sum（）

这就产生了
               count
regex     age
.*ccc-r.* 32      16
          44     244
.*nft-r.* 23      90
.*zxy-r.* 16      55

这很漂亮，谢谢你的回复！这很漂亮，谢谢你的回复！