如何在python中快速获取集合的所有交点_Python_Algorithm

如何在python中快速获取集合的所有交点

python algorithm

如何在python中快速获取集合的所有交点,python,algorithm,Python,Algorithm,我想用python计算有限整数集（这里实现为列表列表）集合的所有（不同）交点（为了避免混淆，问题的最后给出了一个正式定义）：我有一个迭代执行的算法，但是它相当慢（我应该发布它吗？），测试用例是 [[0, 1, 2, 3, 4, 9], [0, 1, 4, 5, 6, 10], [0, 2, 4, 5, 7, 11], [1, 3, 4, 6, 8, 12], [2, 3, 4, 7, 8, 13], [4, 5, 6, 7, 8, 14], [0, 1, 9, 10, 15, 16], [0,

我想用python计算有限整数集（这里实现为列表列表）集合的所有（不同）交点（为了避免混淆，问题的最后给出了一个正式定义）：

我有一个迭代执行的算法，但是它相当慢（我应该发布它吗？），测试用例是

[[0, 1, 2, 3, 4, 9], [0, 1, 4, 5, 6, 10], [0, 2, 4, 5, 7, 11], [1, 3, 4, 6, 8, 12], [2, 3, 4, 7, 8, 13], [4, 5, 6, 7, 8, 14], [0, 1, 9, 10, 15, 16], [0, 2, 9, 11, 15, 17], [1, 3, 9, 12, 16, 18], [2, 3, 9, 13, 17, 18], [9, 15, 16, 17, 18, 19], [0, 5, 10, 11, 15, 20], [1, 6, 10, 12, 16, 21], [10, 15, 16, 19, 20, 21], [5, 6, 10, 14, 20, 21], [11, 15, 17, 19, 20, 22], [5, 7, 11, 14, 20, 22], [2, 7, 11, 13, 17, 22], [7, 8, 13, 14, 22, 23], [3, 8, 12, 13, 18, 23], [13, 17, 18, 19, 22, 23], [14, 19, 20, 21, 22, 23], [6, 8, 12, 14, 21, 23], [12, 16, 18, 19, 21, 23]]

计算起来需要2.5秒

有没有办法快速完成

形式定义（没有latex模式实际上很难）：设A={A1，…，An}是非负整数的有限集Ai的有限集。然后，输出应该是集合{A的B:B子集中集合的交集}

所以形式算法是取A的所有子集的所有交集的并集，但这显然是永远的

非常感谢

这里是一个递归解决方案。在您的测试示例中，它几乎是即时的：

def allIntersections(frozenSets):
    if len(frozenSets) == 0:
        return []
    else:
        head = frozenSets[0]
        tail = frozenSets[1:]
        tailIntersections = allIntersections(tail)
        newIntersections = [head]
        newIntersections.extend(tailIntersections)
        newIntersections.extend(head & s for s in tailIntersections)
        return list(set(newIntersections))

def all_intersections(lists):
    sets = allIntersections([frozenset(s) for s in lists])
    return [list(s) for s in sets]

编辑这里是对相同想法的更清晰、非递归的实现

如果将空集合的交集定义为泛集合，则问题最简单，并且可以通过所有元素的并集获得适当的泛集合。这是格理论中的一个标准步骤，与将空集合的并集作为空集具有双重性。如果您不想要，您可以随时丢弃此通用套件：

def allIntersections(frozenSets):
    universalSet = frozenset.union(*frozenSets)
    intersections = set([universalSet])
    for s in frozenSets:
        moreIntersections = set(s & t for t in intersections)
        intersections.update(moreIntersections)
    return intersections

def all_intersections(lists):
    sets = allIntersections([frozenset(s) for s in lists])
    return [list(s) for s in sets]

对于您的测试示例来说，这是如此之快的原因是，即使您的集合有24个集合，因此有2**24（1680万）个潜在交叉点，但实际上只有242个（如果不计算空交叉点，则为241个）不同的交叉点。因此，每次通过环路的交叉口数量最多只有几百个

可以选择24个集合，这样所有2**24个可能的交点实际上都是不同的，因此很容易看出最坏情况是指数型的。但是，如果像在您的测试示例中那样，交叉口的数量很小，那么这种方法将允许您快速计算它们

一个潜在的优化可能是在循环遍历集合之前对集合进行不断增大的排序。处理较小的前端设置可能会导致更早出现更多的空交叉口，从而使不同交叉口的总数更小，直到接近环路末端

迭代解决方案，在我的机器上为您的大型测试输入大约需要3.5毫秒：

from itertools import starmap, product
from operator import and_

def all_intersections(sets):
    # Convert to set of frozensets for uniquification/type correctness
    last = new = sets = set(map(frozenset, sets))
    # Keep going until further intersections add nothing to results
    while new:
        # Compute intersection of old values with newly found values
        new = set(starmap(and_, product(last, new)))
        last = sets.copy()  # Save off prior state
        new -= last         # Determine truly newly added values
        sets |= new         # Accumulate newly added values in complete set
    # No more intersections being generated, convert results to canonical
    # form, list of lists, where each sublist is displayed in order, and
    # the top level list is ordered first by size of sublist, then by contents
    return sorted(map(sorted, sets), key=lambda x: (len(x), x))

基本上，它只是在旧结果集和新发现的交叉点之间进行双向交叉，直到一轮交叉点没有改变任何东西，然后就完成了

注意：这实际上不是最好的解决方案（递归在算法上足够好，可以在测试数据上获胜，其中John Coleman的解决方案在将排序添加到外包装后，使其与格式匹配，大约需要0.94毫秒，而我的解决方案需要3.5毫秒）。我主要是把它作为用其他方法解决问题的一个例子。

这看起来更像是“输入并集的所有子集”。你说的两个列表的交集是什么意思？列表没有定义良好的交集操作符，但集合有。例如，[0,1]相交[1,0]是否为空？[0,1]?, [1,0]? 此外--您是指列表对的所有兴趣部分还是列表元组的所有兴趣部分（包括三元组等）。不，在上面的示例中，

[1,3]

是

的并集的子集，但它不是

中元素的交集，因此，在输出中……@ John Coleman：我确实认为列表是作为集合相交的。马上就要澄清这一点。你为什么要这么做？@ChristianStump:事实上，这几乎不需要时间，因为集合要么在增长，要么不在增长，当它增长时，

=通过简单地检查长度并继续进行比较短路（首先测试长度，只有当长度匹配时，才会逐个元素进行检查）。这一缺陷主要是重复工作造成的；它是在重新组合以前循环中已经组合的元素。@ChristianStump:我的解释现在有些错误，因为我改进了它以消除该缺陷（它现在将新发现的交叉点与旧交叉点相交，避免在第二个循环和随后的循环中重新将旧交叉点与旧交叉点相交）。但是递归解决方案避免了set
构造、set
复制和set
差分带来的额外开销，因此它仍然比较慢。这很好。我不熟悉星图。我仍然对你能用itertools做这么多感到惊讶，如果你有兴趣考虑的话，我有一个后续问题。太好了！“这是格理论中的一个标准动作”：为了使它成为格，我确实也需要这个集合（否则，连接没有很好的定义）。我有一个后续问题，如果你有兴趣考虑的话。@ChristianStump听起来确实很有趣，虽然我不确定我今天是否有足够的时间认真考虑这件事。我有一些家庭责任。
from itertools import starmap, product
from operator import and_

def all_intersections(sets):
    # Convert to set of frozensets for uniquification/type correctness
    last = new = sets = set(map(frozenset, sets))
    # Keep going until further intersections add nothing to results
    while new:
        # Compute intersection of old values with newly found values
        new = set(starmap(and_, product(last, new)))
        last = sets.copy()  # Save off prior state
        new -= last         # Determine truly newly added values
        sets |= new         # Accumulate newly added values in complete set
    # No more intersections being generated, convert results to canonical
    # form, list of lists, where each sublist is displayed in order, and
    # the top level list is ordered first by size of sublist, then by contents
    return sorted(map(sorted, sets), key=lambda x: (len(x), x))