Python 根据值比较字典列表

Python 根据值比较字典列表,python,dictionary,Python,Dictionary,我有两个字典列表: old_data = [{'company': 'Amazon', 'logged_in': '2019-01-20'}, {'company': 'Facebook', 'logged_in': '2019-04-20'}, {'company': 'Google', 'logged_in': '2019-04-20'}] new_data = [{'company': 'Amazon', 'logged_in': '201

我有两个字典列表:

old_data = [{'company': 'Amazon', 'logged_in': '2019-01-20'},
            {'company': 'Facebook', 'logged_in': '2019-04-20'},
            {'company': 'Google', 'logged_in': '2019-04-20'}]

new_data = [{'company': 'Amazon', 'logged_in': '2019-01-26'},
            {'company': 'Facebook', 'logged_in': '2019-04-12'},
            {'company': 'LinkedIn', 'logged_in': '2019-04-20'},
            {'company': 'Wiki', 'logged_in': '2019-04-20'}]
只有在以下情况下,我才有兴趣从新的_数据中获取列表元素:

  • 新_数据中的公司不在旧_数据中
  • 如果公司同时在新数据和旧数据中,只有在新数据中的登录时间晚于旧数据中的登录时间
  • 预期成果:

    [{'company': 'Amazon', 'logged_in': '2019-01-26'},
     {'company': 'LinkedIn', 'logged_in': '2019-04-20'},
     {'company': 'Wiki', 'logged_in': '2019-04-20'}]
    
    到目前为止,我试过:

    filter_data = []
    for nd in new_data:
        if nd['company'] not in [d['company'] for d in old_data]:
            filter_data.append(nd)
        elif nd['company'] in [d['company'] for d in old_data]:
            date_ = # logged_in time of the company from old_data
            filter_data.append(nd if nd['logged_in']> date_)
    filter_data
    
    输出:

    [{'company': 'Amazon', 'logged_in': '2019-01-26'},
     {'company': 'LinkedIn', 'logged_in': '2019-04-20'},
     {'company': 'Wiki', 'logged_in': '2019-04-20'}]
    
    注意:

    [{'company': 'Amazon', 'logged_in': '2019-01-26'},
     {'company': 'LinkedIn', 'logged_in': '2019-04-20'},
     {'company': 'Wiki', 'logged_in': '2019-04-20'}]
    
    • 旧数据
      转换为字典以便于查找
        这里有一种方法,假设
        旧数据中的公司名称不重复:

        old_data = [{'company': 'Amazon', 'logged_in': '2019-01-20'},
                    {'company': 'Facebook', 'logged_in': '2019-04-20'},
                    {'company': 'Google', 'logged_in': '2019-04-20'}]
        
        new_data = [{'company': 'Amazon', 'logged_in': '2019-01-26'},
                    {'company': 'Facebook', 'logged_in': '2019-04-12'},
                    {'company': 'LinkedIn', 'logged_in': '2019-04-20'},
                    {'company': 'Wiki', 'logged_in': '2019-04-20'}]
        # Make dictionary mapping company names to logged in times
        old_data_dict = {d['company']: d['logged_in'] for d in old_data}
        # Make result by comparing logged in times to previous value or empty string
        result = [d for d in new_data if d['logged_in'] > old_data_dict.get(d['company'], '')]
        # Print result
        print(*result, sep='\n')
        # {'company': 'Amazon', 'logged_in': '2019-01-26'}
        # {'company': 'LinkedIn', 'logged_in': '2019-04-20'}
        # {'company': 'Wiki', 'logged_in': '2019-04-20'}
        
        编辑:如果
        old_data
        可能包含多个具有相同公司名称的词典,则可以按如下方式定义
        old_data_dict

        old_data_dict = {}
        for d in old_data:
            old_data_dict[d['company']] = max(d['logged_in'],
                                              old_data_dict.get(d['company'], ''))
        

        如果您使用反向索引字典
        old\u dic
        ,它会变得简单:

        old_data = [{'company': 'Amazon', 'logged_in': '2019-01-20'},
                    {'company': 'Facebook', 'logged_in': '2019-04-20'},
                    {'company': 'Google', 'logged_in': '2019-04-20'}]
        
        new_data = [{'company': 'Amazon', 'logged_in': '2019-01-26'},
                    {'company': 'Facebook', 'logged_in': '2019-04-12'},
                    {'company': 'LinkedIn', 'logged_in': '2019-04-20'},
                    {'company': 'Wiki', 'logged_in': '2019-04-20'}]
        
        old_dic = {o["company"]: {"logged_in": o["logged_in"]} for o in old_data}
        
        result = [
            n for n in new_data 
            if n["company"] not in s or 
               n["logged_in"] > old_dic[n["company"]]["logged_in"]
        ]
        
        
        产出:

        [{'company': 'Amazon', 'logged_in': '2019-01-26'},
         {'company': 'LinkedIn', 'logged_in': '2019-04-20'},
         {'company': 'Wiki', 'logged_in': '2019-04-20'}]
        

        尝试从您停止的地方重新开始:

        def find_logged_in(company, olddata):
            for od in olddata:
                if od['company']==company:
                    return od['logged_in']
            return None
        
        
        filter_data = []
        for nd in new_data:
            if nd['company'] not in [d['company'] for d in old_data]:
                filter_data.append(nd)
            elif nd['company'] in [d['company'] for d in old_data]:
                date_ = find_logged_in(nd['company'], old_data) 
                if nd['logged_in'] > date_:
                    filter_data.append(nd)
        filter_data
        
        结果:

        [{'company': 'Amazon', 'logged_in': '2019-01-26'},
         {'company': 'LinkedIn', 'logged_in': '2019-04-20'},
         {'company': 'Wiki', 'logged_in': '2019-04-20'}]
        

        您可以使用熊猫执行此操作:

        from pandas import DataFrame
        a = DataFrame(new_data+old_data).groupby('company',as_index=False).max().to_dict('record')
        filter_data = [x for x in a if x not in b]
        

        您可以消除日期时间转换,只需比较字符串,因为它们是ISO 8601格式。@benvc谢谢…没有想到这一点。