python将dict行与SUM合并_Python_Dictionary

python将dict行与SUM合并

python dictionary

python将dict行与SUM合并,python,dictionary,Python,Dictionary,我有很多dict行，超过1000万行，如下所示： {'value_01'：'123'，'value_02'：'456'，'datacenter'：'1'，'bytes'：'25'} {'value_01'：'123'，'value_02'：'456'，'datacenter'：'1'，'bytes'：'35'} {'value_01'：'678'，'value_02'：'901'，'datacenter'：'2'，'bytes'：'55'} {'value_01'：'678'，'value_02

我有很多dict行，超过1000万行，如下所示：

{'value_01'：'123'，'value_02'：'456'，'datacenter'：'1'，'bytes'：'25'}
{'value_01'：'123'，'value_02'：'456'，'datacenter'：'1'，'bytes'：'35'}
{'value_01'：'678'，'value_02'：'901'，'datacenter'：'2'，'bytes'：'55'}
{'value_01'：'678'，'value_02'：'456'，'datacenter'：'2'，'bytes'：'15'}

是否可以将所有其他键和值都相同的行合并为一个字节的总和：我想尽量减少行数，并有这样的。它应该加快下一步的处理速度

{'value_01'：'123'，'value_02'：'456'，'datacenter'：'1'，'bytes'：'60'}
{'value_01'：'678'，'value_02'：'901'，'datacenter'：'2'，'bytes'：'55'}
{'value_01'：'678'，'value_02'：'456'，'datacenter'：'2'，'bytes'：'15'}

提前感谢。

下面的代码应该可以工作

from collections import defaultdict

lst = [{'value_01': '123', 'value_02': '456', 'datacenter': '1', 'bytes': '25'},
       {'value_01': '123', 'value_02': '456', 'datacenter': '1', 'bytes': '35'},
       {'value_01': '678', 'value_02': '901', 'datacenter': '2', 'bytes': '55'},
       {'value_01': '678', 'value_02': '456', 'datacenter': '2', 'bytes': '15'}]
keys = ['value_01', 'value_02', 'datacenter']
data = defaultdict(int)
for entry in lst:
    key = tuple([entry[key] for key in keys])
    data[key] += int(entry['bytes'])
print(data)

输出

defaultdict(<class 'int'>, {('123', '456', '1'): 60, ('678', '901', '2'): 55, ('678', '456', '2'): 15})

defaultdict（，{（'123'，'456'，'1'）：60，（'678'，'901'，'2'）：55，（'678'，'456'，'2'）：15}）

使用在所有“其他”键上建立索引的中间字典，您可以在公共字典中为其他字段的每个组合累积“字节”值。然后将索引值转换回字典列表：

lst = [{'value_01': '123', 'value_02': '456', 'datacenter': '1', 'bytes': '25'},
       {'value_01': '123', 'value_02': '456', 'datacenter': '1', 'bytes': '35'},
       {'value_01': '678', 'value_02': '901', 'datacenter': '2', 'bytes': '55'},
       {'value_01': '678', 'value_02': '456', 'datacenter': '2', 'bytes': '15'}]

merged = dict()
for d in lst:
    k = map(d.get,sorted({*d}-{"bytes"}))  # index on all other fields
    m = merged.setdefault(tuple(k),d)      # add/get first instance
    if m is not d:                         # accumulate bytes (as strings) 
        m['bytes'] = str(int(m['bytes']) + int(d['bytes']))
mergedList = list(merged.values())

print(mergedList)
[{'value_01': '123', 'value_02': '456', 'datacenter': '1', 'bytes': '60'},
 {'value_01': '678', 'value_02': '901', 'datacenter': '2', 'bytes': '55'},
 {'value_01': '678', 'value_02': '456', 'datacenter': '2', 'bytes': '15'}]

即使您的数据未按其他字段组合分组，也可以在不进行排序的情况下（即在O（n）时间内）执行此操作。如果键的顺序不同，它也会工作。缺少键可能会有问题，但可以使用理解而不是

map（d.get，

）来考虑

请注意，您确实应该将字节计数存储为整数而不是字符串

请重复并从“演示如何解决此编码问题”不是堆栈溢出问题。我们希望您做出诚实的尝试，然后询问有关您的算法或技术的特定问题。堆栈溢出不是为了取代现有的文档和教程。也就是说，大多数应用程序将使用PANDAS（数据框）为此。如果你想这样做，请学习熊猫教程。注意

groupby

和

sum

方法。是的。这是可能的。到目前为止你尝试了什么？