Python 根据元素顺序查找集合的所有不相交子集

Python 根据元素顺序查找集合的所有不相交子集,python,list,python-2.7,subset,Python,List,Python 2.7,Subset,假设我想在Python2.7中实现一个解决方案 我有一个字符串列表,例如a=['AA','BB','CC','DD'] 期望的输出将是a的一组不相交子集,例如a_1、a_2。。。这样的 (A_1 U A_2 U ... U A_N) = A, (A_1 ∩ A_2 ∩ ... ∩ A_N) = Ø, 在遵守A中元素顺序的同时(A_1,A_2,…,A_N不能包含A中的非相邻元素) 对于一个国家来说,这将是: A_1,A_2。。。答: ['AA','BB','CC','DD'],Ø ['AA

假设我想在Python2.7中实现一个解决方案

我有一个字符串列表,例如a=['AA','BB','CC','DD']

期望的输出将是a的一组不相交子集,例如a_1、a_2。。。这样的

(A_1 U A_2 U ... U A_N) = A,

(A_1 ∩ A_2 ∩ ... ∩ A_N) = Ø, 
在遵守A中元素顺序的同时(A_1,A_2,…,A_N不能包含A中的非相邻元素)

对于一个国家来说,这将是:

A_1,A_2。。。答:

  • ['AA','BB','CC','DD'],Ø
  • ['AA'],['BB','CC','DD']
  • ['AA','BB'],['CC','DD']
  • ['AA','BB','CC'],['DD']
  • ['AA']、['BB']、['CC']、['DD']
  • ['AA'、['BB']、['CC']、['DD']
  • ['AA']、['BB'、['CC']、['DD']
  • ['AA']、['BB']、['CC'、['DD']
(希望我没有遗漏任何内容,但我想你明白了)


重点是高效-意味着相对快速且不会太浪费内存。我知道,对于一个更大的列表,组合的数量可能会爆炸,但我的列表永远不会超过5个元素。

我已经找到了一个类似问题的答案,其中唯一的区别是我想要所有的子集,而它们只需要最大长度为2的子集

该解决方案相当于找到所有可能的整数组合,其总和为n(输入列表的长度),然后将该解决方案重新映射到单词列表以找到其子集

针对其问题的伪代码:

push an empty list onto the stack;
while (the stack is not empty) {
  pop the top list off the stack;
  if (the sum of its entries is n)
    add it to the solution set;
  else if (the sum of its entries is less than n)
    add a 1 to a copy of the list and push it onto the stack;
    add a 2 to a copy of the list and push it onto the stack;
  }
}
此问题的伪代码(扩展):

我的Python实现:

import copy

def generate_subsets(words):

    # get length of word list
    list_len = len(words)

    # initialize stack, subset_lens list
    stack = [[], ]
    subset_lens = []

    while stack:
        current_item = stack.pop(-1)
        if sum(current_item) == list_len:
            subset_lens.append(current_item)
        elif sum(current_item) < list_len:
            for j in range(1, list_len+1):
                new_item = copy.deepcopy(current_item)
                new_item.append(j)
                stack.append(new_item)

    # remap subset lengths to actual word subsets
    subsets = []

    for subset_len in subset_lens:
        subset = []
        starting_index = 0
        for index in subset_len:
            subset.append('_'.join(words[starting_index:starting_index+index]))
            starting_index+= index
        subsets.append(subset)

    return subsets
输出:

['AA_BB_CC_DD']
['AA_BB_CC', 'DD']
['AA_BB', 'CC_DD']
['AA_BB', 'CC', 'DD']
['AA', 'BB_CC_DD']
['AA', 'BB_CC', 'DD']
['AA', 'BB', 'CC_DD']
['AA', 'BB', 'CC', 'DD']
如果有人能找到更有效的解决方案,我很乐意在答案/评论中看到

generate_subsets(['AA', 'BB', 'CC', 'DD'])
['AA_BB_CC_DD']
['AA_BB_CC', 'DD']
['AA_BB', 'CC_DD']
['AA_BB', 'CC', 'DD']
['AA', 'BB_CC_DD']
['AA', 'BB_CC', 'DD']
['AA', 'BB', 'CC_DD']
['AA', 'BB', 'CC', 'DD']