Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/333.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从没有重复的元组列表中获取具有相同N个交点的所有元组组组的最快算法_Python_Algorithm_Performance_Tuples_Intersection - Fatal编程技术网

Python 从没有重复的元组列表中获取具有相同N个交点的所有元组组组的最快算法

Python 从没有重复的元组列表中获取具有相同N个交点的所有元组组组的最快算法,python,algorithm,performance,tuples,intersection,Python,Algorithm,Performance,Tuples,Intersection,我有一个100个元组的列表。每个元组包含5个唯一的整数。我想知道找到所有具有完全相同的N=2交点的组的最快方法。如果一个元组有多对元素,这些元素与其他元组有2个交点,则查找所有元素并存储在不同的组中。预期输出是唯一列表的列表[1,2,3,4,5,4,5,6,7,8]与[4,5,6,7,8,1,2,3,4,5]相同,其中每个列表是一个组,其所有元组具有相同的N=2个交点。下面是我的代码: from collections import defaultdict from random import

我有一个100个元组的列表。每个元组包含5个唯一的整数。我想知道找到所有具有完全相同的N=2交点的组的最快方法。如果一个元组有多对元素,这些元素与其他元组有2个交点,则查找所有元素并存储在不同的组中。预期输出是唯一列表的列表[1,2,3,4,5,4,5,6,7,8]与[4,5,6,7,8,1,2,3,4,5]相同,其中每个列表是一个组,其所有元组具有相同的N=2个交点。下面是我的代码:

from collections import defaultdict
from random import sample, choice

lst =  [tuple(sample(range(10), 5)) for _ in range(100)]

dct = defaultdict(list)
N = 2
for i in lst:
    for j in lst:
        if len(set(i).intersection(set(j))) == N:
            dct[i].append(j)
key = choice(list(dct))
print([key] + dct[key])
>>> [(4, 5, 2, 3, 7), (4, 6, 2, 5, 0), (9, 4, 2, 1, 8), (7, 6, 5, 2, 0), (2, 4, 0, 7, 8)]
显然,最后4个元组与第一个元组有2个交点,但不一定是相同的2个元素。那么我应该如何得到具有相同2个交点的元组呢

一个明显的解决方案是强制枚举所有可能的x,y整数对和具有该x,y交集的组元组,但是有更快的算法来做到这一点吗


编辑:[1,2,3,4,5,4,5,6,7,8,4,5,9,10,11]允许位于同一组中,但[1,2,3,4,5,5,6,7,8,4,5,6,10,11]不允许,因为4,5,6,7,8与4,5,6,10,11有3个交点。在这种情况下,应将其分为两组[1,2,3,4,5,4,5,5,6,7,8]和[1,2,3,4,5,5,6,10,11]。当然,最终结果将包含各种大小的组,包括许多只有两个元组的短列表,但这正是我想要的。

基于简单组合的方法就足够了:

from collections import defaultdict
from itertools import combinations

res = defaultdict(set)
for t1, t2 in combinations(tuples, 2):
    overlap = set(t1) & set(t2)
    if len(overlap) == 2:
        cur = res[frozenset(overlap)]
        cur.add(t1)
        cur.add(t2)
结果:

defaultdict(set,
            {frozenset({2, 4}): {(2, 4, 0, 7, 8),
              (4, 5, 2, 2, 4),
              (4, 6, 2, 6, 0),
              (8, 4, 2, 1, 8)},
             frozenset({2, 5}): {(4, 5, 2, 2, 4), (7, 6, 5, 2, 0)}})

基于简单组合的方法就足够了:

from collections import defaultdict
from itertools import combinations

res = defaultdict(set)
for t1, t2 in combinations(tuples, 2):
    overlap = set(t1) & set(t2)
    if len(overlap) == 2:
        cur = res[frozenset(overlap)]
        cur.add(t1)
        cur.add(t2)
结果:

defaultdict(set,
            {frozenset({2, 4}): {(2, 4, 0, 7, 8),
              (4, 5, 2, 2, 4),
              (4, 6, 2, 6, 0),
              (8, 4, 2, 1, 8)},
             frozenset({2, 5}): {(4, 5, 2, 2, 4), (7, 6, 5, 2, 0)}})

我喜欢@acushner的解决方案看起来是多么干净,但我写了一个更快的解决方案:

def all_n_intersections2(xss, n):
    xss = [frozenset(xs) for xs in xss]
    result = {}
    while xss:
        xsa = xss.pop()
        for xsb in xss:
            ixs = xsa.intersection(xsb)
            if len(ixs) == n:
                if ixs not in result:
                    result[ixs] = [xsa, xsb]
                else:
                    result[ixs].append(xsb)
    return result
如果我让他们互相攻击:

from timeit import timeit
from random import sample

from collections import defaultdict
from itertools import combinations


def all_n_intersections1(xss, n):
    res = defaultdict(set)
    for t1, t2 in combinations(xss, 2):
        overlap = set(t1) & set(t2)
        if len(overlap) == n:
            cur = res[frozenset(overlap)]
            cur.add(t1)
            cur.add(t2)


def all_n_intersections2(xss, n):
    xss = [frozenset(xs) for xs in xss]
    result = {}
    while xss:
        xsa = xss.pop()
        for xsb in xss:
            ixs = xsa.intersection(xsb)
            if len(ixs) == n:
                if ixs not in result:
                    result[ixs] = [xsa, xsb]
                else:
                    result[ixs].append(xsb)
    return result


data = [tuple(sample(range(10), 5)) for _ in range(100)]

print(timeit(lambda: all_n_intersections1(data, 2), number=1000))
print(timeit(lambda: all_n_intersections2(data, 2), number=1000))
结果:

3.4294801999999995
1.4871790999999999
加上一些评论:

def all_n_intersections2(xss, n):
    # using frozensets to be able to use them as dict keys, convert only once
    xss = [frozenset(xs) for xs in xss]
    result = {}
    # keep going until there are no more items left to combine
    while xss:
        # popping to compare against all others remaining, intersect each pair only once
        xsa = xss.pop()
        for xsb in xss:
            # using library intersection, assuming the native implementation is fastest
            ixs = xsa.intersection(xsb)
            if len(ixs) == n:
                if ixs not in result:
                    # not using default dict, initialising with these two
                    result[ixs] = [xsa, xsb]
                else:
                    # otherwise, xsa was already in there, appending xsb
                    result[ixs].append(xsb)
    return result
解决方案的作用:

对于来自xss的xsa、xsb的每个组合,它计算交集 如果交叉点ixs是目标长度n,则使用ixs作为键将xsa和xsb添加到字典中的列表中 除非源数据中存在重复的元组,否则将避免重复的附录
我喜欢@acushner的解决方案看起来是多么干净,但我写了一个更快的解决方案:

def all_n_intersections2(xss, n):
    xss = [frozenset(xs) for xs in xss]
    result = {}
    while xss:
        xsa = xss.pop()
        for xsb in xss:
            ixs = xsa.intersection(xsb)
            if len(ixs) == n:
                if ixs not in result:
                    result[ixs] = [xsa, xsb]
                else:
                    result[ixs].append(xsb)
    return result
如果我让他们互相攻击:

from timeit import timeit
from random import sample

from collections import defaultdict
from itertools import combinations


def all_n_intersections1(xss, n):
    res = defaultdict(set)
    for t1, t2 in combinations(xss, 2):
        overlap = set(t1) & set(t2)
        if len(overlap) == n:
            cur = res[frozenset(overlap)]
            cur.add(t1)
            cur.add(t2)


def all_n_intersections2(xss, n):
    xss = [frozenset(xs) for xs in xss]
    result = {}
    while xss:
        xsa = xss.pop()
        for xsb in xss:
            ixs = xsa.intersection(xsb)
            if len(ixs) == n:
                if ixs not in result:
                    result[ixs] = [xsa, xsb]
                else:
                    result[ixs].append(xsb)
    return result


data = [tuple(sample(range(10), 5)) for _ in range(100)]

print(timeit(lambda: all_n_intersections1(data, 2), number=1000))
print(timeit(lambda: all_n_intersections2(data, 2), number=1000))
结果:

3.4294801999999995
1.4871790999999999
加上一些评论:

def all_n_intersections2(xss, n):
    # using frozensets to be able to use them as dict keys, convert only once
    xss = [frozenset(xs) for xs in xss]
    result = {}
    # keep going until there are no more items left to combine
    while xss:
        # popping to compare against all others remaining, intersect each pair only once
        xsa = xss.pop()
        for xsb in xss:
            # using library intersection, assuming the native implementation is fastest
            ixs = xsa.intersection(xsb)
            if len(ixs) == n:
                if ixs not in result:
                    # not using default dict, initialising with these two
                    result[ixs] = [xsa, xsb]
                else:
                    # otherwise, xsa was already in there, appending xsb
                    result[ixs].append(xsb)
    return result
解决方案的作用:

对于来自xss的xsa、xsb的每个组合,它计算交集 如果交叉点ixs是目标长度n,则使用ixs作为键将xsa和xsb添加到字典中的列表中 除非源数据中存在重复的元组,否则将避免重复的附录

每个组都应该有完全相同的2个交叉点的元组,但是如果其中一个元组与另一个元组有3个交叉点,那么这是不允许的。这段代码有效吗?你是在问如何解决这个问题,还是在问如何找到解决这个问题的最快方法?不,代码只返回与第一个元组有2个交集的元组。我不确定我是否正确理解相同的2个交集。所以[1,2,3,4,1,2,5,6,3,4,5,6]是不允许的?假设交集是1,2,那么第三个元组当然是不允许的。每个组都应该有完全相同的2个交集的元组,但是如果其中一个元组与另一个元组有3个交集,那么它是不允许的。这个代码有效吗?你是在问如何解决这个问题,还是在问如何找到解决这个问题的最快方法?不,代码只返回与第一个元组有2个交集的元组。我不确定我是否正确理解相同的2个交集。所以[1,2,3,4,1,2,5,6,3,4,5,6]是不允许的?假设交集是1,2,那么第三个元组当然是不允许的。这看起来不错。您可以将元组更改为仅包含唯一整数并再次显示结果吗?如果它与任何其他元组完全共享该对,那么这不会向res中的pair组添加一个元组吗?试着用元组=[1,2,3,1,2,4,5,1,2,4,6]@tobias_k嗯,我觉得我们现在的问题还不够清楚?您的示例将生成defaultdictset,{frozenset{1,2}:{1,2,3,1,2,4,5,1,2,4,6},我可以看到这可能会引起混淆。也许最好的解决方案是向集合中添加元组对,然后您可以从那里了解如何实际连接示例中的元组,并且该结果与问题中排除的一个操作类似。这个问题很有趣,我认为解决方案不会简单或快速,但我真的不完全清楚如何处理不同的边缘情况。这看起来不错。您可以将元组更改为仅包含唯一整数并再次显示结果吗?如果它与任何其他元组完全共享该对,那么这不会向res中的pair组添加一个元组吗?试着用元组=[1,2,3,1,2,4,5,1,2,4,6]@tobias_k嗯,我觉得我们现在的问题还不够清楚?您的示例将生成defaultdictset,{frozenset{1,2}:{1,2,3,1,2,4,5,1,2,4,6},我可以看到这可能会导致混淆。ma
如果最好的解决方案是将元组对添加到集合中,那么您可以从中了解如何在示例中连接元组,并且该结果与问题中排除的一个操作类似。这个问题很有趣,我认为解决方案不会简单或快速,但我真的不完全清楚如何处理不同的边缘案例。更聪明的方法*哦,等等,除非他只关心第一个元组与所有可能的元组对。好的捕获,更新,更快:对于那些想知道的人,试试。。Exception块是必需的,但丢失了dit:replaced try?这是相同的代码,还是我遗漏了一些细微的差异?只需用注释显示一次,然后使用另一种方法作为参考。更聪明的方法*哦,等等,除非他只关心第一个元组与所有可能的元组对。好的捕获,更新,更快:对于那些想知道的人,试试。。Exception块是必需的,但丢失了dit:replaced try?这是相同的代码,还是我遗漏了一些细微的差异?只需用注释显示一次,然后另一个方法作为参考。