Python 从列表中创建组合，如果分隔符字符的子字符串位于列表项的多个子元素中，则将其删除_Python_List_Itertools

Python 从列表中创建组合，如果分隔符字符的子字符串位于列表项的多个子元素中，则将其删除

python list

Python 从列表中创建组合，如果分隔符字符的子字符串位于列表项的多个子元素中，则将其删除,python,list,itertools,Python,List,Itertools,我有一个列表，我使用itertools.combines创建所有组合。每个列表项中的元素都可以用字符串“：”分隔。我需要删除在多个元素中有多个相同匹配子字符串出现的列表项。字符串中的字符直到“：”（用于正则表达式匹配的分隔符？？？）需要检查列表项中的每个子元素。或者，有更好的方法吗 inList = [['TEST1: sub1'], ['TEST1: sub2'], ['TEST1: sub3'], ['TESTING FOR FUN: randomtext'], ['TESTING FOR

我有一个列表，我使用itertools.combines创建所有组合。每个列表项中的元素都可以用字符串“：”分隔。我需要删除在多个元素中有多个相同匹配子字符串出现的列表项。字符串中的字符直到“：”（用于正则表达式匹配的分隔符？？？）需要检查列表项中的每个子元素。或者，有更好的方法吗

inList = [['TEST1: sub1'],
['TEST1: sub2'],
['TEST1: sub3'],
['TESTING FOR FUN: randomtext'],
['TESTING FOR FUN: random text x2'],
['ABC123: dog']]
outputList = list(combinations(inList,3))
outputList

我得到的结果是：

[(['TEST1: sub1'], ['TEST1: sub2']),
 (['TEST1: sub1'], ['TEST1: sub3']),
 (['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['TEST1: sub3']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['TESTING FOR FUN: random text x2']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]

但我想删除与子元素匹配的子字符串，直到分隔符“：”

检查子元素在列表项的其他子元素中是否出现>1次后的所需输出：

(['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]

*注意，列表中的前两项在所需输出中被删除了吗？（这适用于出现

之前的子字符串的其他情况：“

与字符串长度无关。

如果所需的输出正确，则可以将其分解为三个单独的步骤：

首先，分隔符表示键值关系，因此在执行任何其他操作之前，您可以使用字典对具有相同键值的数据进行分组

第二，用不同的键取尽可能多的

长度的数据组合

最后，对于这些组合中的每一个，使用itertools产品获得组合中所有可能的对

from itertools import combinations, product
from collections import defaultdict

inList = [['TEST1: sub1'],
['TEST1: sub2'],
['TEST1: sub3'],
['TESTING FOR FUN: randomtext'],
['TESTING FOR FUN: random text x2'],
['ABC123: dog']]


inDict = defaultdict(list)
for lst in inList:
    key = lst[0].partition(':')[0]
    inDict[key].append(lst)

print(inDict)
#Output:
defaultdict(list,
            {'TEST1': [['TEST1: sub1'], ['TEST1: sub2'], ['TEST1: sub3']],
             'TESTING FOR FUN': [['TESTING FOR FUN: randomtext'],
              ['TESTING FOR FUN: random text x2']],
             'ABC123': [['ABC123: dog']]})


temp = combinations(inDict.values(), 2) #2 length pairs from all dict values. change the number here as needed
result = []
for group in temp:
    result.extend(product(*group)) #calculate all products for each pair of lists. 

print(result)
#Output:
[(['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]

你的代码显示了3种长度组合，但你提供的示例/输出不匹配，每个元素只有2种长度组合。肯定有更好的方法，但我不确定你需要哪种方法。你能澄清一下吗？@Paritosh Singh你是个天才！很好的捕捉，我复制+粘贴的3种长度从不同的期望结果来看是不正确的t、你是对的。我测试了两个长度并得到了想要的结果。这正是我想要的，非常感谢！你能解释一下

result.extend（product（*group））中的星号（*）吗

？@Chris yep，这称为解包操作符。我使用它解包包含2个列表的元组，然后将其传递到产品中。它有效地删除元组，因此类似于

产品（组[0]，组[1]）

但能够动态解压传递的尽可能多的参数。您可能希望进一步联机阅读，例如，或其他一些资源。