基于python的键值对分层分组

基于python的键值对分层分组,python,python-2.7,grouping,Python,Python 2.7,Grouping,我有这样一份清单: data = [ {'date':'2017-01-02', 'model': 'iphone5', 'feature':'feature1'}, {'date':'2017-01-02', 'model': 'iphone7', 'feature':'feature2'}, {'date':'2017-01-03', 'model': 'iphone6', 'feature':'feature2'}, {'date':'2017-01-03', 'model': 'ipho

我有这样一份清单:

data = [
{'date':'2017-01-02', 'model': 'iphone5', 'feature':'feature1'},
{'date':'2017-01-02', 'model': 'iphone7', 'feature':'feature2'},
{'date':'2017-01-03', 'model': 'iphone6', 'feature':'feature2'},
{'date':'2017-01-03', 'model': 'iphone6', 'feature':'feature2'},
{'date':'2017-01-03', 'model': 'iphone7', 'feature':'feature3'},
{'date':'2017-01-10', 'model': 'iphone7', 'feature':'feature2'},
{'date':'2017-01-10', 'model': 'iphone7', 'feature':'feature1'},
]
我想做到这一点:

[
   {
      '2017-01-02':[{'iphone5':['feature1']}, {'iphone7':['feature2']}]
   },
   {
      '2017-01-03': [{'iphone6':['feature2']}, {'iphone7':['feature3']}]
   },
   {
      '2017-01-10':[{'iphone7':['feature2', 'feature1']}]
   }
]
我需要一个有效的方法,因为它可能需要很多数据

我试着这样做:

data = sorted(data, key=itemgetter('date'))
date = itertools.groupby(data, key=itemgetter('date'))
但我对“日期”键的值一无所获

稍后,我将迭代此结构以构建HTML

total_result = list()
result = dict()
inner_value = dict()

for d in data:
    if d["date"] not in result:
        if result:
            total_result.append(result)
        result = dict()
        result[d["date"]] = set()
        inner_value = dict()

    if d["model"] not in inner_value:
        inner_value[d["model"]] = set()

    inner_value[d["model"]].add(d["feature"])
    tmp_v = [{key: list(inner_value[key])} for key in inner_value]
    result[d["date"]] = tmp_v

total_result.append(result)
总结果

[{'2017-01-02': [{'iphone7': ['feature2']}, {'iphone5': ['feature1']}]},
 {'2017-01-03': [{'iphone6': ['feature2']}, {'iphone7': ['feature3']}]},
 {'2017-01-10': [{'iphone7': ['feature2', 'feature1']}]}]

你可以试试这个,这是我的方法,td是一个dict来存储{iphone:index}来检查dict列表中是否存在新项:

from itertools import groupby
from operator import itemgetter

r = []
for i in groupby(sorted(data, key=itemgetter('date')), key=itemgetter('date')):
    td, tl = {}, []
    for j in i[1]:
        if j["model"] not in td:
            tl.append({j["model"]: [j["feature"]]})
            td[j["model"]] = len(tl) - 1
        elif j["feature"] not in tl[td[j["model"]]][j["model"]]:
            tl[td[j["model"]]][j["model"]].append(j["feature"])
    r.append({i[0]: tl})
结果:

[
  {'2017-01-02': [{'iphone5': ['feature1']}, {'iphone7': ['feature2']}]},
  {'2017-01-03': [{'iphone6': ['feature2']}, {'iphone7': ['feature3']}]},
  {'2017-01-10': [{'iphone7': ['feature2', 'feature1']}]}
]

事实上,我认为数据结构可以简化,也许您不需要这么多嵌套。

您可以使用defaultdict高效、干净地完成这项工作。不幸的是,这是一个非常高级的用法,而且很难阅读

from collections import defaultdict
from pprint import pprint

# create a dictionary whose elements are automatically dictionaries of sets
result_dict = defaultdict(lambda: defaultdict(set))

# Construct a dictionary with one key for each date and another dict ('model_dict') 
# as the value.
# The model_dict has one key for each model and a set of features as the value.
for d in data:
    result_dict[d["date"]][d["model"]].add(d["feature"])

# more explicit version:
# for d in data:
#     model_dict = result_dict[d["date"]]   # created automatically if needed
#     feature_set = model_dict[d["model"]]  # created automatically if needed
#     feature_set.add(d["feature"])

# convert the result_dict into the required form
result_list = [
    {   
        date: [
            {phone: list(feature_set)} 
                for phone, feature_set in sorted(model_dict.items())
        ]
    } for date, model_dict in sorted(result_dict.items())
]

pprint(result_list)
# [{'2017-01-02': [{'iphone5': ['feature1']}, {'iphone7': ['feature2']}]},
#  {'2017-01-03': [{'iphone6': ['feature2']}, {'iphone7': ['feature3']}]},
#  {'2017-01-10': [{'iphone7': ['feature2', 'feature1']}]}]

如果你可以使用它,你应该考虑。带有默认设置的字典可能会帮助…@ Aikanaro,我认为只有在数据按日期排序之前,这才可能起作用。