Python 在列表的所有列表中查找重复项并将其删除

Python 在列表的所有列表中查找重复项并将其删除,python,python-3.x,list,unique,Python,Python 3.x,List,Unique,我读了大量的例子,但没有完全找到我想要的。尝试了几种方法,但都在寻找最好的方法 因此,我们的想法是: s1 = ['a','b','c'] s2 = ['a','potato','d'] s3 = ['a','b','h'] strings=[s1,s2,s3] 结果应该是: ['c'] ['potato','d'] ['h'] 因为这些项目在整个列表中是唯一的 谢谢你的建议:)怎么样: [i for i in s1 if i not in s2+s3] #gives ['c'] [j fo

我读了大量的例子,但没有完全找到我想要的。尝试了几种方法,但都在寻找最好的方法

因此,我们的想法是:

s1 = ['a','b','c']
s2 = ['a','potato','d']
s3 = ['a','b','h']
strings=[s1,s2,s3]
结果应该是:

['c']
['potato','d']
['h']
因为这些项目在整个列表中是唯一的

谢谢你的建议:)

怎么样:

[i for i in s1 if i not in s2+s3] #gives ['c']
[j for j in s2 if j not in s1+s3] #gives ['potato', 'd']
[k for k in s3 if k not in s1+s2] #gives ['h']
如果要将所有这些项目都列在列表中:

uniq = [[i for i in s1 if i not in s2+s3],
[j for j in s2 if j not in s1+s3],
[k for k in s3 if k not in s1+s2]]

#output
[['c'], ['potato', 'd'], ['h']]

一般来说,您可以保留所有项目的计数器,然后保留仅出现一次的项目

In [21]: from collections import Counter 

In [23]: counts = Counter(s1 + s2 + s3)                                                                                                                                                                     

In [24]: [i for i in s1 if counts[i] == 1]                                                                                                                                                                  
Out[24]: ['c']

In [25]: [i for i in s2 if counts[i] == 1]                                                                                                                                                                  
Out[25]: ['potato', 'd']

In [26]: [i for i in s3 if counts[i] == 1]                                                                                                                                                                  
Out[26]: ['h']
如果有嵌套列表,则可以执行以下操作:

In [28]: s = [s1, s2, s3]                                                                                                                                                                                   

In [30]: from itertools import chain                                                                                                                                                                        

In [31]: counts = Counter(chain.from_iterable(s))                                                                                                                                                           

In [32]: [[i for i in lst if counts[i] == 1] for lst in s]                                                                                                                                                  
Out[32]: [['c'], ['potato', 'd'], ['h']]

要找出3个列表中的唯一元素,您可以使用set对称差分(^)操作和并集(^)操作,因为您有3个列表

s1=['a','b','c'] >>>s2=['a','potato','d'] >>>s3=['a','b','h'] >>>(集合(s1)|(集合(s2))^集合(s3)
假设您希望此方法适用于任意数量的序列,解决此问题的直接方法(但可能不是最有效的方法,可能是从上一次迭代中构造的
其他
对象)是:

def deep_unique_set(*seqs):
    for i, seq in enumerate(seqs):
        others = set(x for seq_ in (seqs[:i] + seqs[i + 1:]) for x in seq_)
        yield [x for x in seq if x not in others]
或者速度稍快但内存效率较低且其他方面相同:

def deep_unique_preset(*seqs):
    pile = list(x for seq in seqs for x in seq)
    k = 0
    for seq in seqs:
        num = len(seq)
        others = set(pile[:k] + pile[k + num:])
        yield [x for x in seq if x not in others]
        k += num
使用提供的输入对其进行测试:

s1 = ['a', 'b', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']

print(list(deep_unique_set(s1, s2, s3)))
# [['c'], ['potato', 'd'], ['h']]
print(list(deep_unique_preset(s1, s2, s3)))
# [['c'], ['potato', 'd'], ['h']]
请注意,如果输入在其中一个列表中包含重复项,则不会将其删除,即:

s1 = ['a', 'b', 'c', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']

print(list(deep_unique_set(s1, s2, s3)))
# [['c', 'c'], ['potato', 'd'], ['h']]
print(list(deep_unique_preset(s1, s2, s3)))
# [['c', 'c'], ['potato', 'd'], ['h']]

如果应删除所有重复项,则更好的方法是计算值。选择的方法是使用
集合。计数器
,如中所述:

或者,可以跟踪重复,例如:

def deep_unique_repeat(*seqs):
    seen = set()
    repeated = set(x for seq in seqs for x in seq if x in seen or seen.add(x))
    for seq in seqs:
        yield [x for x in seq if x not in repeated]
它将具有与基于
的方法相同的行为。基于计数器的方法:

s1 = ['a', 'b', 'c', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']
print(list(deep_unique_repeat(s1, s2, s3)))
# [[], ['potato', 'd'], ['h']]
但速度稍快,因为它不需要跟踪未使用的计数

另一种效率极低的方法是使用
list.count()
代替全局计数器进行计数:

def deep_unique_count(*seqs):
    pile = list(x for seq in seqs for x in seq)
    for seq in seqs:
        yield [x for x in seq if pile.count(x) == 1]
最后两种方法也在本文中提出


下面提供了一些时间安排:

n = 100
m = 100
s = tuple([random.randint(0, 10 * n * m) for _ in range(n)] for _ in range(m))
for func in funcs:
    print(func.__name__)
    %timeit list(func(*s))
    print()

# deep_unique_set
# 10 loops, best of 3: 86.2 ms per loop

# deep_unique_preset
# 10 loops, best of 3: 57.3 ms per loop

# deep_unique_count
# 1 loop, best of 3: 1.76 s per loop

# deep_unique_repeat
# 1000 loops, best of 3: 1.87 ms per loop

# deep_unique_counter
# 100 loops, best of 3: 2.32 ms per loop
计数器(来自集合)是实现此目的的方法:

from collections import Counter

s1 = ['a','b','c']
s2 = ['a','potato','d']
s3 = ['a','b','h']
strings=[s1,s2,s3]

counts  = Counter(s for sList in strings for s in sList)
uniques = [ [s for s in sList if counts[s]==1] for sList in strings ]

print(uniques) # [['c'], ['potato', 'd'], ['h']]
如果不允许使用导入的模块,可以使用list的count()方法,但效率要低得多:

allStrings = [ s for sList in strings for s in sList ]
unique     = [[ s for s in sList if allStrings.count(s)==1] for sList in strings]
使用集合来识别重复的值可以提高效率:

allStrings = ( s for sList in strings for s in sList )
seen       = set()
repeated   = set( s for s in allStrings if s in seen or seen.add(s))
unique     = [ [ s for s in sList if s not in repeated] for sList in strings ]

-已回答请检查此链接。@LakshmiRam不一样如果
s1=['a','b','c','c']
,会发生什么情况?这不起作用,因为对称差将返回奇数次的值(例如“a”)如果我们使用并集和对称的_差,这是可能的。这是一个多么漂亮和优雅的解决方案。我将用这个函数替换我自己的函数来删除重复项。谢谢。
allStrings = [ s for sList in strings for s in sList ]
unique     = [[ s for s in sList if allStrings.count(s)==1] for sList in strings]
allStrings = ( s for sList in strings for s in sList )
seen       = set()
repeated   = set( s for s in allStrings if s in seen or seen.add(s))
unique     = [ [ s for s in sList if s not in repeated] for sList in strings ]