在Python中应用加权平均法聚合bin中的字典列表_Python_List_Dictionary_Weighted Average

在Python中应用加权平均法聚合bin中的字典列表

python list dictionary

在Python中应用加权平均法聚合bin中的字典列表,python,list,dictionary,weighted-average,Python,List,Dictionary,Weighted Average,我有一个字典列表，如下所示： _input = [{'cumulated_quantity': 30, 'price': 7000, 'quantity': 30}, {'cumulated_quantity': 80, 'price': 7002, 'quantity': 50}, {'cumulated_quantity': 130, 'price': 7010, 'quantity': 50}, {'cumulated_quantity

我有一个字典列表，如下所示：

_input = [{'cumulated_quantity': 30, 'price': 7000, 'quantity': 30},
         {'cumulated_quantity': 80, 'price': 7002, 'quantity': 50},
         {'cumulated_quantity': 130, 'price': 7010, 'quantity': 50},
         {'cumulated_quantity': 330, 'price': 7050, 'quantity': 200},
         {'cumulated_quantity': 400, 'price': 7065, 'quantity': 70}]

result = [{'cumulated_quantity': 100, 'price': 7003, 'quantity': 100},
          {'cumulated_quantity': 200, 'price': 7038, 'quantity': 100},
          {'cumulated_quantity': 300, 'price': 7050, 'quantity': 100},
          {'cumulated_quantity': 400, 'price': 7060.5, 'quantity': 100}]

我想把字典分成数量为100的箱子，在箱子里价格是按加权平均数计算的。结果应该如下所示：

_input = [{'cumulated_quantity': 30, 'price': 7000, 'quantity': 30},
         {'cumulated_quantity': 80, 'price': 7002, 'quantity': 50},
         {'cumulated_quantity': 130, 'price': 7010, 'quantity': 50},
         {'cumulated_quantity': 330, 'price': 7050, 'quantity': 200},
         {'cumulated_quantity': 400, 'price': 7065, 'quantity': 70}]

result = [{'cumulated_quantity': 100, 'price': 7003, 'quantity': 100},
          {'cumulated_quantity': 200, 'price': 7038, 'quantity': 100},
          {'cumulated_quantity': 300, 'price': 7050, 'quantity': 100},
          {'cumulated_quantity': 400, 'price': 7060.5, 'quantity': 100}]

结果字典中的加权平均值计算如下：

7003 = (30*7000+50*7002+20*7010)/100 
7038 = (30*7010+70*7050)/100
7050 = 100*7050/100
7060.5 = (30*7050+70*7065)/100

通过使用熊猫数据帧，我成功地收到了结果，但是它们的性能太慢了（大约0.5秒）。python中有没有一种快速的方法可以做到这一点？

不使用pandas，自己动手几乎可以瞬间完成：

result = []
cumulative_quantity = 0
bucket = {'price': 0.0, 'quantity': 0}
for dct in lst:
    dct_quantity = dct['quantity']  # enables non-destructive decrementing
    while dct_quantity > 0:
        if bucket['quantity'] == 100:
            bucket['cumulative_quantity'] = cumulative_quantity
            result.append(bucket)
            bucket = {'price': 0.0, 'quantity': 0}
        added_quantity = min([dct_quantity, 100 - bucket['quantity']])
        bucket['price'] = (bucket['price'] * bucket['quantity'] + dct['price'] * added_quantity) / (bucket['quantity'] + added_quantity)
        dct_quantity -= added_quantity
        bucket['quantity'] += added_quantity
        cumulative_quantity += added_quantity
if bucket['quantity'] != 0:
    bucket['cumulative_quantity'] = cumulative_quantity
    result.append(bucket)

给予

这可以线性进行，如O（p），其中p是零件的数量（相当于O（n*k），其中k是每个dict必须分割成的平均块数（在您的示例中k=1.6））。

如何定义箱子？是储物箱<代码>的“累计数量”：100<代码>用于<代码>的“累计数量”储物箱大小是任意选择的，它将是一个变量。如果累积的数量与此相比，我的初始解决方案是如此缓慢，回答得很好，谢谢。