Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在保留顺序的同时从列表中删除重复项?_Python_List_Duplicates_Unique - Fatal编程技术网

Python 如何在保留顺序的同时从列表中删除重复项?

Python 如何在保留顺序的同时从列表中删除重复项?,python,list,duplicates,unique,Python,List,Duplicates,Unique,在Python中是否有一个内置程序可以在保留顺序的同时从列表中删除重复项?我知道我可以使用集合删除重复项,但这会破坏原始顺序。我也知道我可以像这样玩我自己的游戏: def uniq(input): output = [] for x in input: if x not in output: output.append(x) return output def test_round(x,y): return round(x) != round(y) (

在Python中是否有一个内置程序可以在保留顺序的同时从列表中删除重复项?我知道我可以使用集合删除重复项,但这会破坏原始顺序。我也知道我可以像这样玩我自己的游戏:

def uniq(input):
  output = []
  for x in input:
    if x not in output:
      output.append(x)
  return output
def test_round(x,y):
    return round(x) != round(y)
(谢谢你的帮助。)

但是如果可能的话,我想使用一个内置的或者更具Python风格的习惯用法


相关问题:

这里有一些选择:

最快的一个:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]
为什么分配
seen.add
seen\u add
而不是调用
seen.add
?Python是一种动态语言,解析
seen.add
每次迭代比解析局部变量的成本更高<代码>已看到。添加可能在迭代之间发生更改,而运行时不够聪明,无法排除这种情况。为了安全起见,它必须每次检查对象

如果您计划在同一数据集上大量使用此函数,那么使用有序集可能会更好:

O(1)每次操作的插入、删除和成员检查

(小的附加说明:
seen.add()
始终返回
None
,因此上面的
仅作为尝试设置更新的一种方式,而不是逻辑测试的一个组成部分。)

列表甚至不必排序,充分条件是将相等的值分组在一起

Edit:我假设“保持顺序”意味着列表实际上是有序的。如果不是这样,那么MizardX的解决方案就是正确的。


社区编辑:但是,这是“将重复的连续元素压缩为单个元素”的最优雅的方式。

如果您需要一个行程序,那么这可能会有所帮助:

reduce(lambda x, y: x + y if y[0] not in x else x, map(lambda x: [x],lst))
。。。应该可以,但如果我对无哈希类型(例如列表列表)有错误,请纠正我,基于MizardX的:

def f7_noHash(seq)
    seen = set()
    return [ x for x in seq if str( x ) not in seen and not seen.add( str( x ) )]

MizardX的回答提供了多种方法的良好集合

这是我在大声思考时想到的:

mylist = [x for i,x in enumerate(mylist) if x not in mylist[i+1:]]

您可以引用由符号“[1]”生成的列表理解
例如,下面的函数unique通过引用元素列表来定义元素列表,而不改变其顺序

def unique(my_list): 
    return [x for x in my_list if x not in locals()['_[1]']]
演示:

输出:

[1, 2, 3, 4, 5]

独特的→ <代码>['1','2','3','6','4','5']

我想如果你想维持订单

您可以尝试以下方法: 或者类似地,您可以这样做: 您也可以这样做: 也可以这样写:
编辑2020

从CPython/pypy3.6开始(作为3.7中的语言保证),plain
dict
是按插入顺序排列的,甚至比(也是C实现的)collections.OrderedDict更高效。因此,到目前为止,最快的解决方案也是最简单的:

>>> items = [1, 2, 0, 1, 3, 2]
>>> list(dict.fromkeys(items))
[1, 2, 0, 3]
list(set(items))
类似,这会将所有工作推送到C层(在CPython上),但由于
dict
s是按插入顺序排列的,
dict.fromkeys
不会失去顺序。它比
列表(集合(项目))
慢(通常需要50-100%的时间),但比任何其他保序解决方案快得多(大约需要一半的时间)

编辑2016

正如Raymond所说,在Python3.5+中,C语言实现了
OrderedDict
,列表理解方法将比
OrderedDict
慢(除非您实际上需要在末尾使用列表,即使是在输入非常短的情况下)。因此,3.5+的最佳解决方案是
OrderedDict

重要编辑2015

如前所述,库(
pip install more_itertools
)包含一个函数,用于解决此问题,而不会出现任何不可读的
未显示。在列表理解中添加
突变。这也是最快的解决方案:

>>> from  more_itertools import unique_everseen
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(unique_everseen(items))
[1, 2, 0, 3]
只需一个简单的库导入,没有黑客攻击。 这来自itertools配方的一个实现,该配方如下所示:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

在Python
2.7+
中,公认的通用习惯用法(它可以工作,但没有针对速度进行优化,我现在使用它)用于:

运行时:O(N)

这看起来比:

seen = set()
[x for x in seq if x not in seen and not seen.add(x)]
并且不使用丑陋的黑客

not seen.add(x)
这取决于
set.add
是一个就地方法,它总是返回
None
,因此
notnone
的计算结果为
True


但是请注意,hack解决方案的原始速度更快,尽管它具有相同的运行时复杂性O(N)。

借用Haskell的
nub
函数定义列表时使用的递归思想,这将是一种递归方法:

def unique(lst):
    return [] if lst==[] else [lst[0]] + unique(filter(lambda x: x!= lst[0], lst[1:]))
e、 g:

我试着用它来增加数据量,并看到了次线性时间复杂性(不是确定的,但建议这对于正常数据应该是好的)

我还认为有趣的是,这可以通过其他操作很容易地推广到唯一性。像这样:

import operator
def unique(lst, cmp_op=operator.ne):
    return [] if lst==[] else [lst[0]] + unique(filter(lambda x: cmp_op(x, lst[0]), lst[1:]), cmp_op)
例如,您可以传入一个函数,该函数使用舍入到同一整数的概念,就好像出于唯一性目的它是“相等的”,如下所示:

def uniq(input):
  output = []
  for x in input:
    if x not in output:
      output.append(x)
  return output
def test_round(x,y):
    return round(x) != round(y)
然后,unique(一些列表,test\u round)将提供列表中唯一的元素,其中唯一性不再意味着传统的相等(这是通过使用任何基于集合或基于dict键的方法来解决此问题而隐含的)而是意味着对于元素可能舍入到的每个可能整数K,仅取舍入到K的第一个元素,例如:

In [6]: unique([1.2, 5, 1.9, 1.1, 4.2, 3, 4.8], test_round)
Out[6]: [1.2, 5, 1.9, 4.2, 3]

使用
\u排序
a
numpy
数组的相对有效的方法:

b = np.array([1,3,3, 8, 12, 12,12])    
numpy.hstack([b[0], [x[0] for x in zip(b[1:], b[:-1]) if x[0]!=x[1]]])
产出:

array([ 1,  3,  8, 12])

对于另一个非常老的问题,另一个非常晚的回答:

通过使用
seen
set技术,具有执行此操作的功能,但是:

  • 处理标准的
    功能
  • 不使用不体面的黑客
  • 通过预绑定
    优化循环。添加
    而不是查找N次。(
    f7
    也会这样做,但s
    def unique(lst):
        return [] if lst==[] else [lst[0]] + unique(filter(lambda x: x!= lst[0], lst[1:]))
    
    In [118]: unique([1,5,1,1,4,3,4])
    Out[118]: [1, 5, 4, 3]
    
    In [122]: %timeit unique(np.random.randint(5, size=(1)))
    10000 loops, best of 3: 25.3 us per loop
    
    In [123]: %timeit unique(np.random.randint(5, size=(10)))
    10000 loops, best of 3: 42.9 us per loop
    
    In [124]: %timeit unique(np.random.randint(5, size=(100)))
    10000 loops, best of 3: 132 us per loop
    
    In [125]: %timeit unique(np.random.randint(5, size=(1000)))
    1000 loops, best of 3: 1.05 ms per loop
    
    In [126]: %timeit unique(np.random.randint(5, size=(10000)))
    100 loops, best of 3: 11 ms per loop
    
    import operator
    def unique(lst, cmp_op=operator.ne):
        return [] if lst==[] else [lst[0]] + unique(filter(lambda x: cmp_op(x, lst[0]), lst[1:]), cmp_op)
    
    def test_round(x,y):
        return round(x) != round(y)
    
    In [6]: unique([1.2, 5, 1.9, 1.1, 4.2, 3, 4.8], test_round)
    Out[6]: [1.2, 5, 1.9, 4.2, 3]
    
    b = np.array([1,3,3, 8, 12, 12,12])    
    numpy.hstack([b[0], [x[0] for x in zip(b[1:], b[:-1]) if x[0]!=x[1]]])
    
    array([ 1,  3,  8, 12])
    
    def unique(iterable):
        seen = set()
        seen_add = seen.add
        for element in itertools.ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    
    [l[i] for i in range(len(l)) if l.index(l[i]) == i]
    
    l = [1,2,2,3,3,...]
    n = []
    n.extend(ele for ele in l if ele not in set(n))
    
    >>> l = [5, 6, 6, 1, 1, 2, 2, 3, 4]
    >>> reduce(lambda r, v: v in r[1] and r or (r[0].append(v) or r[1].add(v)) or r, l, ([], set()))[0]
    [5, 6, 1, 2, 3, 4]
    
    default = (list(), set())
    # use list to keep order
    # use set to make lookup faster
    
    def reducer(result, item):
        if item not in result[1]:
            result[0].append(item)
            result[1].add(item)
        return result
    
    >>> reduce(reducer, l, default)[0]
    [5, 6, 1, 2, 3, 4]
    
    def uniquefy_list(a):
        return uniquefy_list(a[1:]) if a[0] in a[1:] else [a[0]]+uniquefy_list(a[1:]) if len(a)>1 else [a[0]]
    
        import pandas as pd
        import numpy as np
    
        uniquifier = lambda alist: pd.Series(alist).drop_duplicates().tolist()
    
        # from the chosen answer 
        def f7(seq):
            seen = set()
            seen_add = seen.add
            return [ x for x in seq if not (x in seen or seen_add(x))]
    
        alist = np.random.randint(low=0, high=1000, size=10000).tolist()
    
        print uniquifier(alist) == f7(alist)  # True
    
        In [104]: %timeit f7(alist)
        1000 loops, best of 3: 1.3 ms per loop
        In [110]: %timeit uniquifier(alist)
        100 loops, best of 3: 4.39 ms per loop
    
    def deduplicate(l):
        count = {}
        (read,write) = (0,0)
        while read < len(l):
            if l[read] in count:
                read += 1
                continue
            count[l[read]] = True
            l[write] = l[read]
            read += 1
            write += 1
        return l[0:write]
    
    text = "ask not what your country can do for you ask what you can do for your country"
    sentence = text.split(" ")
    noduplicates = [(sentence[i]) for i in range (0,len(sentence)) if sentence[i] not in sentence[:i]]
    print(noduplicates)
    
    ['ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you']
    
    >>> list(dict.fromkeys('abracadabra'))
    ['a', 'b', 'r', 'c', 'd']
    
    >>> from collections import OrderedDict
    >>> list(OrderedDict.fromkeys('abracadabra'))
    ['a', 'b', 'r', 'c', 'd']
    
    >>> from iteration_utilities import unique_everseen
    >>> lst = [1,1,1,2,3,2,2,2,1,3,4]
    
    >>> list(unique_everseen(lst))
    [1, 2, 3, 4]
    
    %matplotlib notebook
    
    from iteration_utilities import unique_everseen
    from collections import OrderedDict
    from more_itertools import unique_everseen as mi_unique_everseen
    
    def f7(seq):
        seen = set()
        seen_add = seen.add
        return [x for x in seq if not (x in seen or seen_add(x))]
    
    def iteration_utilities_unique_everseen(seq):
        return list(unique_everseen(seq))
    
    def more_itertools_unique_everseen(seq):
        return list(mi_unique_everseen(seq))
    
    def odict(seq):
        return list(OrderedDict.fromkeys(seq))
    
    from simple_benchmark import benchmark
    
    b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
                  {2**i: list(range(2**i)) for i in range(1, 20)},
                  'list size (no duplicates)')
    b.plot()
    
    import random
    
    b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
                  {2**i: [random.randint(0, 2**(i-1)) for _ in range(2**i)] for i in range(1, 20)},
                  'list size (lots of duplicates)')
    b.plot()
    
    b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
                  {2**i: [1]*(2**i) for i in range(1, 20)},
                  'list size (only duplicates)')
    b.plot()
    
    >>> lst = [{1}, {1}, {2}, {1}, {3}]
    
    >>> list(unique_everseen(lst))
    [{1}, {2}, {3}]
    
    import pandas as pd
    
    my_list = [0, 1, 2, 3, 4, 1, 2, 3, 5]
    
    >>> pd.Series(my_list).drop_duplicates().tolist()
    # Output:
    # [0, 1, 2, 3, 4, 5]
    
    >>> lst = [1, 2, 1, 3, 3, 2, 4]
    >>> list(dict.fromkeys(lst))
    [1, 2, 3, 4]
    
    for i in range(len(l)-1,0,-1): 
        if l[i] in l[:i]: del l[i] 
    
    In [91]: from random import randint, seed                                                                                            
    In [92]: seed('20080808') ; l = [randint(1,6) for _ in range(12)] # Beijing Olympics                                                                 
    In [93]: for i in range(len(l)-1,0,-1): 
        ...:     print(l) 
        ...:     print(i, l[i], l[:i], end='') 
        ...:     if l[i] in l[:i]: 
        ...:          print( ': remove', l[i]) 
        ...:          del l[i] 
        ...:     else: 
        ...:          print() 
        ...: print(l)
    [6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5, 2]
    11 2 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]: remove 2
    [6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]
    10 5 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4]: remove 5
    [6, 5, 1, 4, 6, 1, 6, 2, 2, 4]
    9 4 [6, 5, 1, 4, 6, 1, 6, 2, 2]: remove 4
    [6, 5, 1, 4, 6, 1, 6, 2, 2]
    8 2 [6, 5, 1, 4, 6, 1, 6, 2]: remove 2
    [6, 5, 1, 4, 6, 1, 6, 2]
    7 2 [6, 5, 1, 4, 6, 1, 6]
    [6, 5, 1, 4, 6, 1, 6, 2]
    6 6 [6, 5, 1, 4, 6, 1]: remove 6
    [6, 5, 1, 4, 6, 1, 2]
    5 1 [6, 5, 1, 4, 6]: remove 1
    [6, 5, 1, 4, 6, 2]
    4 6 [6, 5, 1, 4]: remove 6
    [6, 5, 1, 4, 2]
    3 4 [6, 5, 1]
    [6, 5, 1, 4, 2]
    2 1 [6, 5]
    [6, 5, 1, 4, 2]
    1 5 [6]
    [6, 5, 1, 4, 2]
    
    In [94]:                                                                                                                             
    
    # for hashable sequence
    def remove_duplicates(items):
        seen = set()
        for item in items:
            if item not in seen:
                yield item
                seen.add(item)
    
    a = [1, 5, 2, 1, 9, 1, 5, 10]
    list(remove_duplicates(a))
    # [1, 5, 2, 9, 10]
    
    
    
    # for unhashable sequence
    def remove_duplicates(items, key=None):
        seen = set()
        for item in items:
            val = item if key is None else key(item)
            if val not in seen:
                yield item
                seen.add(val)
    
    a = [ {'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 1, 'y': 2}, {'x': 2, 'y': 4}]
    list(remove_duplicates(a, key=lambda d: (d['x'],d['y'])))
    # [{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]
    
    def DelDupes(aseq) :
        seen = set()
        return [x for x in aseq if (x.lower() not in seen) and (not seen.add(x.lower()))]
    
    def HasDupes(aseq) :
        s = set()
        return any(((x.lower() in s) or s.add(x.lower())) for x in aseq)
    
    def GetDupes(aseq) :
        s = set()
        return set(x for x in aseq if ((x.lower() in s) or s.add(x.lower())))
    
    list1 = ["hello", " ", "w", "o", "r", "l", "d"]
    sorted(set(list1 ), key=lambda x:list1.index(x))
    
    ["hello", " ", "w", "o", "r", "l", "d"]
    
    >>> import pandas as pd
    >>> lst = [1, 2, 1, 3, 3, 2, 4]
    >>> pd.unique(lst)
    array([1, 2, 3, 4])
    
    def solve(arr): 
        return list(dict.fromkeys(arr[::-1]))[::-1]
    
    x = [1, 2, 1, 3, 1, 4]
    
    # brute force method
    arr = []
    for i in x:
      if not i in arr:
        arr.insert(x[i],i)
    
    # recursive method
    tmp = []
    def remove_duplicates(j=0):
        if j < len(x):
          if not x[j] in tmp:
            tmp.append(x[j])
          i = j+1  
          remove_duplicates(i)
    
          
    
    remove_duplicates()