Python 从列表中创建组合,如果分隔符字符的子字符串位于列表项的多个子元素中,则将其删除

Python 从列表中创建组合,如果分隔符字符的子字符串位于列表项的多个子元素中,则将其删除,python,list,itertools,Python,List,Itertools,我有一个列表,我使用itertools.combines创建所有组合。每个列表项中的元素都可以用字符串“:”分隔。我需要删除在多个元素中有多个相同匹配子字符串出现的列表项。字符串中的字符直到“:”(用于正则表达式匹配的分隔符???)需要检查列表项中的每个子元素。或者,有更好的方法吗 inList = [['TEST1: sub1'], ['TEST1: sub2'], ['TEST1: sub3'], ['TESTING FOR FUN: randomtext'], ['TESTING FOR

我有一个列表,我使用itertools.combines创建所有组合。每个列表项中的元素都可以用字符串“:”分隔。我需要删除在多个元素中有多个相同匹配子字符串出现的列表项。字符串中的字符直到“:”(用于正则表达式匹配的分隔符???)需要检查列表项中的每个子元素。或者,有更好的方法吗

inList = [['TEST1: sub1'],
['TEST1: sub2'],
['TEST1: sub3'],
['TESTING FOR FUN: randomtext'],
['TESTING FOR FUN: random text x2'],
['ABC123: dog']]
outputList = list(combinations(inList,3))
outputList
我得到的结果是:

[(['TEST1: sub1'], ['TEST1: sub2']),
 (['TEST1: sub1'], ['TEST1: sub3']),
 (['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['TEST1: sub3']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['TESTING FOR FUN: random text x2']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]
但我想删除与子元素匹配的子字符串,直到分隔符“:”

检查子元素在列表项的其他子元素中是否出现>1次后的所需输出:

(['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]

*注意,列表中的前两项在所需输出中被删除了吗?(这适用于出现
之前的子字符串的其他情况:“
与字符串长度无关。

如果所需的输出正确,则可以将其分解为三个单独的步骤:

首先,分隔符表示键值关系,因此在执行任何其他操作之前,您可以使用字典对具有相同键值的数据进行分组

第二,用不同的键取尽可能多的
n
长度的数据组合

最后,对于这些组合中的每一个,使用itertools产品获得组合中所有可能的对

from itertools import combinations, product
from collections import defaultdict

inList = [['TEST1: sub1'],
['TEST1: sub2'],
['TEST1: sub3'],
['TESTING FOR FUN: randomtext'],
['TESTING FOR FUN: random text x2'],
['ABC123: dog']]


inDict = defaultdict(list)
for lst in inList:
    key = lst[0].partition(':')[0]
    inDict[key].append(lst)

print(inDict)
#Output:
defaultdict(list,
            {'TEST1': [['TEST1: sub1'], ['TEST1: sub2'], ['TEST1: sub3']],
             'TESTING FOR FUN': [['TESTING FOR FUN: randomtext'],
              ['TESTING FOR FUN: random text x2']],
             'ABC123': [['ABC123: dog']]})


temp = combinations(inDict.values(), 2) #2 length pairs from all dict values. change the number here as needed
result = []
for group in temp:
    result.extend(product(*group)) #calculate all products for each pair of lists. 

print(result)
#Output:
[(['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]

你的代码显示了3种长度组合,但你提供的示例/输出不匹配,每个元素只有2种长度组合。肯定有更好的方法,但我不确定你需要哪种方法。你能澄清一下吗?@Paritosh Singh你是个天才!很好的捕捉,我复制+粘贴的3种长度从不同的期望结果来看是不正确的t、 你是对的。我测试了两个长度并得到了想要的结果。这正是我想要的,非常感谢!你能解释一下
result.extend(product(*group))中的星号(*)吗
?@Chris yep,这称为解包操作符。我使用它解包包含2个列表的元组,然后将其传递到产品中。它有效地删除元组,因此类似于
产品(组[0],组[1])
但能够动态解压传递的尽可能多的参数。您可能希望进一步联机阅读,例如,或其他一些资源。