Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/333.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从目录列表中删除重复项_Python_Dictionary - Fatal编程技术网

Python 从目录列表中删除重复项

Python 从目录列表中删除重复项,python,dictionary,Python,Dictionary,如果嵌套字典前面没有键,我现在可以删除重复项。使用此函数的DICT列表示例如下: [{'asndb_prefix': '164.39.xxx.0/17', 'cidr': '164.39.xxx.0/17', 'cymru_asn': 'XXX', 'cymru_country': 'GB', 'cymru_owner': 'XXX , GB', 'cymru_prefix': '164.39.xxx.0/17', 'ips': ['164.39.xxx.xxx'],

如果嵌套字典前面没有键,我现在可以删除重复项。使用此函数的DICT列表示例如下:

 [{'asndb_prefix': '164.39.xxx.0/17',
  'cidr': '164.39.xxx.0/17',
  'cymru_asn': 'XXX',
  'cymru_country': 'GB',
  'cymru_owner': 'XXX , GB',
  'cymru_prefix': '164.39.xxx.0/17',
  'ips': ['164.39.xxx.xxx'],
  'network_id': '164.39.xxx.xxx/24',},
 {'asndb_prefix': '54.192.xxx.xxx/16',
  'cidr': '54.192.0.0/16',
  'cymru_asn': '16509',
  'cymru_country': 'US',
  'cymru_owner': 'AMAZON-02 - Amazon.com, Inc., US',
  'cymru_prefix': '54.192.144.0/22',
  'ips': ['54.192.xxx.xxx', '54.192.xxx.xxx'],
  'network_id': '54.192.xxx.xxx/24',
  }]

def remove_dict_duplicates(list_of_dicts):
    """
    "" Remove duplicates in dict 
    """
    list_of_dicts = [dict(t) for t in set([tuple(d.items()) for d in list_of_dicts])]
    # remove the {} before and after - not sure why these are placed as 
    # the first and last element 
    return list_of_dicts[1:-1]
但是,我希望能够基于键和该字典中关联的所有值删除重复项。因此,如果同一个键中有不同的值,我不想删除它,但是如果有一个完整的副本,那么就删除它

    [{'50.16.xxx.0/24': {'asndb_prefix': '50.16.0.0/16',
   'cidr': '50.16.0.0/14',
   'cymru_asn': 'xxxx',
   'cymru_country': 'US',
   'cymru_owner': 'AMAZON-AES - Amazon.com, Inc., US',
   'cymru_prefix': '50.16.0.0/16',
   'ip': '50.16.221.xxx',
   'network_id': '50.16.xxx.0/24',
   'pyasn_asn': xxxx,
   'whois_asn': 'xxxx'}},
   // This would be removed
   {'50.16.xxx.0/24': {'asndb_prefix': '50.16.0.0/16',
   'cidr': '50.16.0.0/14',
   'cymru_asn': 'xxxxx',
   'cymru_country': 'US',
   'cymru_owner': 'AMAZON-AES - Amazon.com, Inc., US',
   'cymru_prefix': '50.16.0.0/16',
   'ip': '50.16.221.xxx',
   'network_id': '50.16.xxx.0/24',
   'pyasn_asn': xxxx,
   'whois_asn': 'xxxx'}},
   // This would NOT be removed
   {'50.16.xxx.0/24': {'asndb_prefix': '50.999.0.0/16',
   'cidr': '50.999.0.0/14',
   'cymru_asn': 'xxxx',
   'cymru_country': 'US',
   'cymru_owner': 'AMAZON-AES - Amazon.com, Inc., US',
   'cymru_prefix': '50.16.0.0/16',
   'ip': '50.16.221.xxx',
   'network_id': '50.16.xxx.0/24',
   'pyasn_asn': xxxx,
   'whois_asn': 'xxxx'}}]

我该怎么做呢?多谢各位

要从目录列表中删除重复项:

list_of_unique_dicts = []
for dict_ in list_of_dicts:
    if dict_ not in list_of_unique_dicts:
        list_of_unique_dicts.append(dict_)

要从目录列表中删除重复项,请执行以下操作:

list_of_unique_dicts = []
for dict_ in list_of_dicts:
    if dict_ not in list_of_unique_dicts:
        list_of_unique_dicts.append(dict_)

如果结果中的顺序不重要,则可以使用集合将DICT转换为冻结集合来删除重复项:

def remove_dict_duplicates(list_of_dicts):
    """
    Remove duplicates.
    """
    packed = set(((k, frozenset(v.items())) for elem in list_of_dicts for
                 k, v in elem.items()))
    return [{k: dict(v)} for k, v in packed]
这假设最里面的dict的所有值都是可散列的

​放弃订单会为大型列表带来潜在的加速。 例如,创建包含100000个元素的列表:

inner = {'asndb_prefix': '50.999.0.0/16',
         'cidr': '50.999.0.0/14',
         'cymru_asn': '14618',
         'cymru_country': 'US',    
         'cymru_owner': 'AMAZON-AES - Amazon.com, Inc., US',    
         'cymru_prefix': '50.16.0.0/16',    
         'ip': '50.16.221.xxx',    
         'network_id': '50.16.xxx.0/24',    
         'pyasn_asn': 14618,    
          'whois_asn': '14618'}

large_list = list_of_dicts + [{x: inner} for x in range(int(1e5))]
反复检查结果列表中的重复项需要相当长的时间:

def remove_dupes(list_of_dicts):
    """Source: answer from wim
    """ 
    list_of_unique_dicts = []
    for dict_ in list_of_dicts
        if dict_ not in list_of_unique_dicts:
            list_of_unique_dicts.append(dict_)
    return list_of_unique_dicts

%timeit  remove_dupes(large_list
1 loop, best of 3: 2min 55s per loop
我的方法是使用集合,速度要快一点:

%timeit remove_dict_duplicates(large_list)
1 loop, best of 3: 590 ms per loop

如果结果中的顺序不重要,则可以使用集合将DICT转换为冻结集合来删除重复项:

def remove_dict_duplicates(list_of_dicts):
    """
    Remove duplicates.
    """
    packed = set(((k, frozenset(v.items())) for elem in list_of_dicts for
                 k, v in elem.items()))
    return [{k: dict(v)} for k, v in packed]
这假设最里面的dict的所有值都是可散列的

​放弃订单会为大型列表带来潜在的加速。 例如,创建包含100000个元素的列表:

inner = {'asndb_prefix': '50.999.0.0/16',
         'cidr': '50.999.0.0/14',
         'cymru_asn': '14618',
         'cymru_country': 'US',    
         'cymru_owner': 'AMAZON-AES - Amazon.com, Inc., US',    
         'cymru_prefix': '50.16.0.0/16',    
         'ip': '50.16.221.xxx',    
         'network_id': '50.16.xxx.0/24',    
         'pyasn_asn': 14618,    
          'whois_asn': '14618'}

large_list = list_of_dicts + [{x: inner} for x in range(int(1e5))]
反复检查结果列表中的重复项需要相当长的时间:

def remove_dupes(list_of_dicts):
    """Source: answer from wim
    """ 
    list_of_unique_dicts = []
    for dict_ in list_of_dicts
        if dict_ not in list_of_unique_dicts:
            list_of_unique_dicts.append(dict_)
    return list_of_unique_dicts

%timeit  remove_dupes(large_list
1 loop, best of 3: 2min 55s per loop
我的方法是使用集合,速度要快一点:

%timeit remove_dict_duplicates(large_list)
1 loop, best of 3: 590 ms per loop


什么钥匙?你给它一个列表,列表没有键…字典不允许重复键。对不起,我一定是误用了术语。“23.21.xxx.0/24”:不被视为密钥吗?在第二个代码块中,复制的元素到底在哪里?在第二个代码块中更新了我的示例什么键?你给它一个列表,列表没有键…字典不允许重复键。对不起,我一定是误用了术语。“23.21.xxx.0/24”:不被视为密钥吗?在第二个代码块中,复制的元素到底在哪里?在第二个代码块中更新了我的示例。您应该提到此方法的附加限制:所有值都必须是可散列的。是的,看起来原始方法也假设了这一点。这确实有效,但是,它不像您所说的那样保持顺序。我相信我选择的答案最适合我的申请。但是如果你的列表很大,它会更快。我同意。我会百分之百记住这一点。就目前而言,这个名单是非常容易管理的。谢谢你的解释!您应该提到此方法的附加限制:所有值都必须是可散列的。是的,看起来原始方法也假设了这一点。这确实有效,但是,它没有像您所说的那样保持顺序。我相信我选择的答案最适合我的申请。但是如果你的列表很大,它会更快。我同意。我会百分之百记住这一点。就目前而言,这个名单是非常容易管理的。谢谢你的解释!非常感谢你。我也很惊讶“in”是如何以这种方式工作的。非常感谢。我也很惊讶“in”是如何以这种方式工作的。