Algorithm 相交集,结果是一组具有集体唯一元素的集

Algorithm 相交集,结果是一组具有集体唯一元素的集,algorithm,set,Algorithm,Set,假设我有以下几套: X -> {1, 2, 3} Y -> {1, 4, 7} Z -> {1, 4, 5} 我在寻找交叉点的组合,这些交叉点产生许多集合,其中每个元素在所有集合中都是唯一的。(实际上是一组散列,其中每个元素引用回它相交的集合): 解决问题时,必须满足以下条件: 对于每个初始集合,每个元素都将位于由最大初始集合数的交点创建的结果集合中 也就是说,初始集合中的每个元素都必须恰好位于一个结果集合中 集合实际上是无限的,这意味着遍历所有有效元素是不可行的,

假设我有以下几套:

X -> {1, 2, 3}  
Y -> {1, 4, 7}  
Z -> {1, 4, 5}
我在寻找交叉点的组合,这些交叉点产生许多集合,其中每个元素在所有集合中都是唯一的。(实际上是一组散列,其中每个元素引用回它相交的集合):

解决问题时,必须满足以下条件:

  • 对于每个初始集合,每个元素都将位于由最大初始集合数的交点创建的结果集合中
  • 也就是说,初始集合中的每个元素都必须恰好位于一个结果集合中
  • 集合实际上是无限的,这意味着遍历所有有效元素是不可行的,但是集合操作是很好的
  • 不包含任何元素的所有结果集都可以忽略
蛮力方法是以相反的顺序在初始集的动力集上循环,使每个动力集相交,然后找出该结果集与所有其他测试交点的差异:

resulting_sets = {}
for sets in powerset(S):
  s = intersection(sets)
  for rs in resulting_sets.keys():
    s -= rs

  if not s.empty():
    resulting_sets[s] = sets # realistically some kind of reference to sets

当然,在设置操作的O(n^2log(n))O(2^n*2^(n/2))时,上述操作效率非常低(就我而言,它可能已经运行了n^2次)。对于这种类型的问题有更好的解决方案吗?

更新:不迭代任何集合,只使用集合操作

该算法以建设性的方式构建结果集,即每次看到新的源集时,我们修改现有的唯一元素集和/或添加新的元素集

其思想是,每一个新的集合都可以分为两部分,一部分包含已经看到的值,另一部分包含新的唯一值。对于第一部分,它被当前结果集进一步划分为不同的子集(最多#个SEW源集的功率集)。对于每个这样的子集,它也分成两部分,一部分与新的源集相交,另一部分不相交。任务是更新这些类别的结果集

对于集合运算的复杂性,这应该是O(n*2^n)。对于OP发布的解决方案,我认为复杂性应该是O(2^(2n)),因为
len(结果集)
在最坏的情况下最多有2^n个元素

def solution(sets):
    result_sets = [] # list of (unique element set, membership) tuples
    for sid, s in enumerate(sets):
        new_sets = []
        for unique_elements, membership in result_sets:
            # The intersect part has wider membership, while the other part
            # has less unique elements (maybe empty).
            # Wider membership must have not been seen before, so add as new.
            intersect = unique_elements & s
            # Special case if all unique elements exist in s, then update
            # in place
            if len(intersect) == len(unique_elements):
                membership.append(sid)
            elif len(intersect) != 0:
                unique_elements -= intersect
                new_sets.append((intersect, membership + [sid]))
            s -= intersect
            if len(s) == 0:
                break
        # Special syntax for Python: there are remaining elements in s
        # This is the part of unseen elements: add as a new result set
        else:
            new_sets.append((s, [sid]))
        result_sets.extend(new_sets)
    print(result_sets)

sets = [{1, 2, 3}, {1, 4, 7}, {1, 4, 5}]
solution(sets)

# output:
# [(set([2, 3]), [0]), (set([1]), [0, 1, 2]), (set([7]), [1]), (set([4]), [1, 2]), (set([5]), [2])]

def solution(sets):
    union = set().union(*sets)
    numSets = len(sets)
    numElements = len(union)
    memberships = {}
    for e in union:
        membership = tuple(i for i, s in enumerate(sets) if e in s)
        if membership not in memberships:
            memberships[membership] = []
        memberships[membership].append(e)
    print(memberships)

sets = [{1, 2, 3}, {1, 4, 7}, {1, 4, 5}]
solution(sets)

# output:
# {(0, 1, 2): [1], (1, 2): [4], (0,): [2, 3], (1,): [7], (2,): [5]}
---------------原始答案如下---------------

其思想是找到每个独特元素的“成员”,即它属于什么集合。然后,我们创建一个字典,根据其成员资格对所有元素进行分组,生成请求的集合。复杂度是O(n*len(sets)),或者在最坏的情况下是O(n^2)

def solution(sets):
    result_sets = [] # list of (unique element set, membership) tuples
    for sid, s in enumerate(sets):
        new_sets = []
        for unique_elements, membership in result_sets:
            # The intersect part has wider membership, while the other part
            # has less unique elements (maybe empty).
            # Wider membership must have not been seen before, so add as new.
            intersect = unique_elements & s
            # Special case if all unique elements exist in s, then update
            # in place
            if len(intersect) == len(unique_elements):
                membership.append(sid)
            elif len(intersect) != 0:
                unique_elements -= intersect
                new_sets.append((intersect, membership + [sid]))
            s -= intersect
            if len(s) == 0:
                break
        # Special syntax for Python: there are remaining elements in s
        # This is the part of unseen elements: add as a new result set
        else:
            new_sets.append((s, [sid]))
        result_sets.extend(new_sets)
    print(result_sets)

sets = [{1, 2, 3}, {1, 4, 7}, {1, 4, 5}]
solution(sets)

# output:
# [(set([2, 3]), [0]), (set([1]), [0, 1, 2]), (set([7]), [1]), (set([4]), [1, 2]), (set([5]), [2])]

def solution(sets):
    union = set().union(*sets)
    numSets = len(sets)
    numElements = len(union)
    memberships = {}
    for e in union:
        membership = tuple(i for i, s in enumerate(sets) if e in s)
        if membership not in memberships:
            memberships[membership] = []
        memberships[membership].append(e)
    print(memberships)

sets = [{1, 2, 3}, {1, 4, 7}, {1, 4, 5}]
solution(sets)

# output:
# {(0, 1, 2): [1], (1, 2): [4], (0,): [2, 3], (1,): [7], (2,): [5]}

谢谢但不幸的是,我不能实际地循环每个元素。集合本身包含多达2^64个元素,并通过排除集合和元素范围(而不是每个单独的元素)对其进行跟踪。其中一个要点指出constraint@Lindenk你是在计算集合运算的复杂性吗?请注意,集合交集是O(min(M,N)),而根据元素的#而言,成员测试是O(1)。哎哟,你说得对。出于某种原因,我假设一个功率集是n^2,而实际上是2^n。哇,那更糟