Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 合并集合列表_Python_Python 2.7 - Fatal编程技术网

Python 合并集合列表

Python 合并集合列表,python,python-2.7,Python,Python 2.7,给定一个集合列表(如setlist=[{'this'、'is'}、{'is'、'a'}、{'test'}]等字符串集合),其思想是将共享字符串的成对并集联接起来。下面的代码段采用了测试成对重叠、连接和使用内部循环中断重新开始的文字方法 我知道这是一种步行方式,对于可用大小的列表(200K组,2到10个字符串)来说,这确实需要很长时间 关于如何提高效率有什么建议吗?谢谢 j = 0 while True: if j == len(setlist): # both for loops

给定一个集合列表(如
setlist=[{'this'、'is'}、{'is'、'a'}、{'test'}
]等字符串集合),其思想是将共享字符串的成对并集联接起来。下面的代码段采用了测试成对重叠、连接和使用内部循环中断重新开始的文字方法

我知道这是一种步行方式,对于可用大小的列表(200K组,2到10个字符串)来说,这确实需要很长时间

关于如何提高效率有什么建议吗?谢谢

j    = 0
while True:
    if j == len(setlist): # both for loops are done
        break # while
    for i in range(0,len(setlist)-1):
        for j in range(i+1,len(setlist)):
            a = setlist[i];
            b = setlist[j];
            if not set(a).isdisjoint(b):     # ... then join them
                newset = set.union( a , b )  # ... new set
                del setlist[j]            # ... drop highest index
                del setlist[i]            # ... drop lowest index
                setlist.insert(0,newset)  # ... introduce consolidated set, which messes up i,j
                break                        # ... back to the top for fresh i,j
        else:
            continue
        break

正如注释中提到的@user2357112,这可以被认为是一个图形问题。每个集合都是顶点,两个集合之间共享的每个单词都是边。然后,您可以在顶点上迭代,并对每个看不见的顶点执行BFS(或DFS)以生成一个新的顶点

另一个选择是使用。联合查找的优点是,不需要构造图,并且当所有集合都具有相同的内容时,不存在退化情况。下面是一个实际应用的例子:

from collections import defaultdict

# Return ancestor of given node
def ancestor(parent, node):
    if parent[node] != node:
        # Do path compression
        parent[node] = ancestor(parent, parent[node])

    return parent[node]

def merge(parent, rank, x, y):
    # Merge sets that x & y belong to
    x = ancestor(parent, x)
    y = ancestor(parent, y)

    if x == y:
        return

    # Union by rank, merge smaller set to larger one
    if rank[y] > rank[x]:
        x, y = y, x

    parent[y] = x
    rank[x] += rank[y]

def merge_union(setlist):
    # For every word in sets list what sets contain it
    words = defaultdict(list)

    for i, s in enumerate(setlist):
        for w in s:
            words[w].append(i)

    # Merge sets that share the word
    parent = list(range(len(setlist)))
    rank = [1] * len(setlist)
    for sets in words.values():
        it = iter(sets)
        merge_to = next(it)
        for x in it:
            merge(parent, rank, merge_to, x)

    # Construct result by union the sets within a component
    result = defaultdict(set)
    for merge_from, merge_to in enumerate(parent):
        result[merge_to] |= setlist[merge_from]

    return list(result.values())

setlist = [
    {'this', 'is'},
    {'is', 'a'},
    {'test'},
    {'foo'},
    {'foobar', 'foo'},
    {'foobar', 'bar'},
    {'alone'}
]

print(merge_union(setlist))
输出:

[{'this', 'is', 'a'}, {'test'}, {'bar', 'foobar', 'foo'}, {'alone'}]

正如注释中提到的@user2357112,这可以被认为是一个图形问题。每个集合都是顶点,两个集合之间共享的每个单词都是边。然后,您可以在顶点上迭代,并对每个看不见的顶点执行BFS(或DFS)以生成一个新的顶点

另一个选择是使用。联合查找的优点是,不需要构造图,并且当所有集合都具有相同的内容时,不存在退化情况。下面是一个实际应用的例子:

from collections import defaultdict

# Return ancestor of given node
def ancestor(parent, node):
    if parent[node] != node:
        # Do path compression
        parent[node] = ancestor(parent, parent[node])

    return parent[node]

def merge(parent, rank, x, y):
    # Merge sets that x & y belong to
    x = ancestor(parent, x)
    y = ancestor(parent, y)

    if x == y:
        return

    # Union by rank, merge smaller set to larger one
    if rank[y] > rank[x]:
        x, y = y, x

    parent[y] = x
    rank[x] += rank[y]

def merge_union(setlist):
    # For every word in sets list what sets contain it
    words = defaultdict(list)

    for i, s in enumerate(setlist):
        for w in s:
            words[w].append(i)

    # Merge sets that share the word
    parent = list(range(len(setlist)))
    rank = [1] * len(setlist)
    for sets in words.values():
        it = iter(sets)
        merge_to = next(it)
        for x in it:
            merge(parent, rank, merge_to, x)

    # Construct result by union the sets within a component
    result = defaultdict(set)
    for merge_from, merge_to in enumerate(parent):
        result[merge_to] |= setlist[merge_from]

    return list(result.values())

setlist = [
    {'this', 'is'},
    {'is', 'a'},
    {'test'},
    {'foo'},
    {'foobar', 'foo'},
    {'foobar', 'bar'},
    {'alone'}
]

print(merge_union(setlist))
输出:

[{'this', 'is', 'a'}, {'test'}, {'bar', 'foobar', 'foo'}, {'alone'}]

什么是
setlist
?我编辑了这个问题并添加了一个setlist示例,预期的输出是
[{'this','is','a'},{'test'}]
。因此,您的预期输出是“this is a test”?有关一系列计时比较和潜在的O(n)算法,请参阅(我自己没有检查逻辑)什么是
setlist
?我编辑了这个问题并添加了一个setlist示例,预期的输出是
[{'this','is','a'},{'test'}]
。因此,您的预期输出是“this is a test”?有关一系列计时比较和潜在的O(n)算法,请参阅(我自己没有检查逻辑)