Python 在列表的所有列表中查找重复项并将其删除
我读了大量的例子,但没有完全找到我想要的。尝试了几种方法,但都在寻找最好的方法 因此,我们的想法是:Python 在列表的所有列表中查找重复项并将其删除,python,python-3.x,list,unique,Python,Python 3.x,List,Unique,我读了大量的例子,但没有完全找到我想要的。尝试了几种方法,但都在寻找最好的方法 因此,我们的想法是: s1 = ['a','b','c'] s2 = ['a','potato','d'] s3 = ['a','b','h'] strings=[s1,s2,s3] 结果应该是: ['c'] ['potato','d'] ['h'] 因为这些项目在整个列表中是唯一的 谢谢你的建议:)怎么样: [i for i in s1 if i not in s2+s3] #gives ['c'] [j fo
s1 = ['a','b','c']
s2 = ['a','potato','d']
s3 = ['a','b','h']
strings=[s1,s2,s3]
结果应该是:
['c']
['potato','d']
['h']
因为这些项目在整个列表中是唯一的
谢谢你的建议:)怎么样:
[i for i in s1 if i not in s2+s3] #gives ['c']
[j for j in s2 if j not in s1+s3] #gives ['potato', 'd']
[k for k in s3 if k not in s1+s2] #gives ['h']
如果要将所有这些项目都列在列表中:
uniq = [[i for i in s1 if i not in s2+s3],
[j for j in s2 if j not in s1+s3],
[k for k in s3 if k not in s1+s2]]
#output
[['c'], ['potato', 'd'], ['h']]
一般来说,您可以保留所有项目的计数器,然后保留仅出现一次的项目
In [21]: from collections import Counter
In [23]: counts = Counter(s1 + s2 + s3)
In [24]: [i for i in s1 if counts[i] == 1]
Out[24]: ['c']
In [25]: [i for i in s2 if counts[i] == 1]
Out[25]: ['potato', 'd']
In [26]: [i for i in s3 if counts[i] == 1]
Out[26]: ['h']
如果有嵌套列表,则可以执行以下操作:
In [28]: s = [s1, s2, s3]
In [30]: from itertools import chain
In [31]: counts = Counter(chain.from_iterable(s))
In [32]: [[i for i in lst if counts[i] == 1] for lst in s]
Out[32]: [['c'], ['potato', 'd'], ['h']]
要找出3个列表中的唯一元素,您可以使用set对称差分(^)操作和并集(^)操作,因为您有3个列表 s1=['a','b','c'] >>>s2=['a','potato','d'] >>>s3=['a','b','h'] >>>(集合(s1)|(集合(s2))^集合(s3)
假设您希望此方法适用于任意数量的序列,解决此问题的直接方法(但可能不是最有效的方法,可能是从上一次迭代中构造的
其他对象)是:
def deep_unique_set(*seqs):
for i, seq in enumerate(seqs):
others = set(x for seq_ in (seqs[:i] + seqs[i + 1:]) for x in seq_)
yield [x for x in seq if x not in others]
或者速度稍快但内存效率较低且其他方面相同:
def deep_unique_preset(*seqs):
pile = list(x for seq in seqs for x in seq)
k = 0
for seq in seqs:
num = len(seq)
others = set(pile[:k] + pile[k + num:])
yield [x for x in seq if x not in others]
k += num
使用提供的输入对其进行测试:
s1 = ['a', 'b', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']
print(list(deep_unique_set(s1, s2, s3)))
# [['c'], ['potato', 'd'], ['h']]
print(list(deep_unique_preset(s1, s2, s3)))
# [['c'], ['potato', 'd'], ['h']]
请注意,如果输入在其中一个列表中包含重复项,则不会将其删除,即:
s1 = ['a', 'b', 'c', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']
print(list(deep_unique_set(s1, s2, s3)))
# [['c', 'c'], ['potato', 'd'], ['h']]
print(list(deep_unique_preset(s1, s2, s3)))
# [['c', 'c'], ['potato', 'd'], ['h']]
如果应删除所有重复项,则更好的方法是计算值。选择的方法是使用集合。计数器,如中所述:
或者,可以跟踪重复,例如:
def deep_unique_repeat(*seqs):
seen = set()
repeated = set(x for seq in seqs for x in seq if x in seen or seen.add(x))
for seq in seqs:
yield [x for x in seq if x not in repeated]
它将具有与基于的方法相同的行为。基于计数器的方法:
s1 = ['a', 'b', 'c', 'c']
s2 = ['a', 'potato', 'd']
s3 = ['a', 'b', 'h']
print(list(deep_unique_repeat(s1, s2, s3)))
# [[], ['potato', 'd'], ['h']]
但速度稍快,因为它不需要跟踪未使用的计数
另一种效率极低的方法是使用list.count()
代替全局计数器进行计数:
def deep_unique_count(*seqs):
pile = list(x for seq in seqs for x in seq)
for seq in seqs:
yield [x for x in seq if pile.count(x) == 1]
最后两种方法也在本文中提出
下面提供了一些时间安排:
n = 100
m = 100
s = tuple([random.randint(0, 10 * n * m) for _ in range(n)] for _ in range(m))
for func in funcs:
print(func.__name__)
%timeit list(func(*s))
print()
# deep_unique_set
# 10 loops, best of 3: 86.2 ms per loop
# deep_unique_preset
# 10 loops, best of 3: 57.3 ms per loop
# deep_unique_count
# 1 loop, best of 3: 1.76 s per loop
# deep_unique_repeat
# 1000 loops, best of 3: 1.87 ms per loop
# deep_unique_counter
# 100 loops, best of 3: 2.32 ms per loop
计数器(来自集合)是实现此目的的方法:
from collections import Counter
s1 = ['a','b','c']
s2 = ['a','potato','d']
s3 = ['a','b','h']
strings=[s1,s2,s3]
counts = Counter(s for sList in strings for s in sList)
uniques = [ [s for s in sList if counts[s]==1] for sList in strings ]
print(uniques) # [['c'], ['potato', 'd'], ['h']]
如果不允许使用导入的模块,可以使用list的count()方法,但效率要低得多:
allStrings = [ s for sList in strings for s in sList ]
unique = [[ s for s in sList if allStrings.count(s)==1] for sList in strings]
使用集合来识别重复的值可以提高效率:
allStrings = ( s for sList in strings for s in sList )
seen = set()
repeated = set( s for s in allStrings if s in seen or seen.add(s))
unique = [ [ s for s in sList if s not in repeated] for sList in strings ]
-已回答请检查此链接。@LakshmiRam不一样如果s1=['a','b','c','c']
,会发生什么情况?这不起作用,因为对称差将返回奇数次的值(例如“a”)如果我们使用并集和对称的_差,这是可能的。这是一个多么漂亮和优雅的解决方案。我将用这个函数替换我自己的函数来删除重复项。谢谢。
allStrings = [ s for sList in strings for s in sList ]
unique = [[ s for s in sList if allStrings.count(s)==1] for sList in strings]
allStrings = ( s for sList in strings for s in sList )
seen = set()
repeated = set( s for s in allStrings if s in seen or seen.add(s))
unique = [ [ s for s in sList if s not in repeated] for sList in strings ]