Python 如何高效地删除列表列表中的连续重复项?
我有一个嵌套列表:Python 如何高效地删除列表列表中的连续重复项?,python,python-3.x,list-comprehension,itertools,Python,Python 3.x,List Comprehension,Itertools,我有一个嵌套列表: l = [['GILTI', 'was', 'intended', 'to','to', 'stifle', 'multinationals'. 'was'], ['like' ,'technology', 'and', 'and','pharmaceutical', 'companies', 'like']] 如何在不使用set或其他类似操作的情况下检测两个连续元素并删除其中一个?这应该是所需的输出: l = [['GILTI', 'was', 'intended
l = [['GILTI', 'was', 'intended', 'to','to', 'stifle', 'multinationals'. 'was'],
['like' ,'technology', 'and', 'and','pharmaceutical', 'companies', 'like']]
如何在不使用set或其他类似操作的情况下检测两个连续元素并删除其中一个?这应该是所需的输出:
l = [['GILTI', 'was', 'intended','to', 'stifle', 'multinationals'. 'was'],
['like' ,'technology', 'and','pharmaceutical', 'companies', 'like']]
from itertools import groupby
l = [['GILTI', 'was', 'intended', 'to','to', 'stifle', 'multinationals', 'was'],
['like' ,'technology', 'and', 'and','pharmaceutical', 'companies', 'like']]
print([[k for k, g in groupby(x)] for x in l])
# [['GILTI', 'was', 'intended', 'to', 'stifle', 'multinationals', 'was'],
# ['like', 'technology', 'and', 'pharmaceutical', 'companies', 'like']]
我试着像这样使用itertools groupby:
from itertools import groupby
[i[0] for i in groupby(l)]
还有一个有序的口述:
from collections import OrderedDict
temp_lis = []
for x in l:
temp_lis.append(list(OrderedDict.fromkeys(x)))
temp_lis
输出:
第二种解决方案看起来效果不错。但是,它是错误的,因为它删除了非连续的重复元素(例如was和like)。如何获得上述所需的输出?您可以像这样使用
groupby
:
[[k for k, g in groupby(x)] for x in l]
如果有多个重复的连续元素,则将保留一个
如果需要完全删除重复的连续元素,请使用:
[[k for k, g in groupby(x) if len(list(g)) == 1] for x in l]
示例:
l = [['GILTI', 'was', 'intended','to', 'stifle', 'multinationals'. 'was'],
['like' ,'technology', 'and','pharmaceutical', 'companies', 'like']]
from itertools import groupby
l = [['GILTI', 'was', 'intended', 'to','to', 'stifle', 'multinationals', 'was'],
['like' ,'technology', 'and', 'and','pharmaceutical', 'companies', 'like']]
print([[k for k, g in groupby(x)] for x in l])
# [['GILTI', 'was', 'intended', 'to', 'stifle', 'multinationals', 'was'],
# ['like', 'technology', 'and', 'pharmaceutical', 'companies', 'like']]
- -方法将计数器添加到iterable,并以枚举对象的形式返回它李>
l = [['GILTI', 'was', 'intended','to', 'stifle', 'multinationals','was'],
['like' ,'technology', 'and','pharmaceutical', 'companies', 'like']]
result = []
for sublist in l:
new_list = []
for index,x in enumerate(sublist):
#validate current and next element of list is same
if len(sublist)-1 >= index+1 and x == sublist[index+1]:
continue
#append none consecutive into new list
new_list.append(x)
#append list into result list
result.append(new_list)
print(result)
O/p:
[['GILTI', 'was', 'intended', 'to', 'stifle', 'multinationals', 'was'],
['like', 'technology', 'and', 'pharmaceutical', 'companies', 'like']]
自定义生成器解决方案:
def deduped(seq):
first = True
for el in seq:
if first or el != prev:
yield el
prev = el
first = False
[list(deduped(seq)) for seq in l]
# => [['GILTI', 'was', 'intended', 'to', 'stifle', 'multinationals', 'was'],
# ['like', 'technology', 'and', 'pharmaceutical', 'companies', 'like']]
编辑:以前的版本无法处理第一个元素
None
。再次感谢您的帮助!那么更具体的解决方案呢?如果我只是对删除“to”和“to”序列感兴趣怎么办?@aywoki,那么你不想同时使用“to”
s?是的,我只是好奇如何在这种情况下进行迭代。这个解决方案解决了这个问题,尽管prev=object()
sentinel也可以解决第一个元素的问题