Python 合并集合列表
给定一个集合列表(如Python 合并集合列表,python,python-2.7,Python,Python 2.7,给定一个集合列表(如setlist=[{'this'、'is'}、{'is'、'a'}、{'test'}]等字符串集合),其思想是将共享字符串的成对并集联接起来。下面的代码段采用了测试成对重叠、连接和使用内部循环中断重新开始的文字方法 我知道这是一种步行方式,对于可用大小的列表(200K组,2到10个字符串)来说,这确实需要很长时间 关于如何提高效率有什么建议吗?谢谢 j = 0 while True: if j == len(setlist): # both for loops
setlist=[{'this'、'is'}、{'is'、'a'}、{'test'}
]等字符串集合),其思想是将共享字符串的成对并集联接起来。下面的代码段采用了测试成对重叠、连接和使用内部循环中断重新开始的文字方法
我知道这是一种步行方式,对于可用大小的列表(200K组,2到10个字符串)来说,这确实需要很长时间
关于如何提高效率有什么建议吗?谢谢
j = 0
while True:
if j == len(setlist): # both for loops are done
break # while
for i in range(0,len(setlist)-1):
for j in range(i+1,len(setlist)):
a = setlist[i];
b = setlist[j];
if not set(a).isdisjoint(b): # ... then join them
newset = set.union( a , b ) # ... new set
del setlist[j] # ... drop highest index
del setlist[i] # ... drop lowest index
setlist.insert(0,newset) # ... introduce consolidated set, which messes up i,j
break # ... back to the top for fresh i,j
else:
continue
break
正如注释中提到的@user2357112,这可以被认为是一个图形问题。每个集合都是顶点,两个集合之间共享的每个单词都是边。然后,您可以在顶点上迭代,并对每个看不见的顶点执行BFS(或DFS)以生成一个新的顶点 另一个选择是使用。联合查找的优点是,不需要构造图,并且当所有集合都具有相同的内容时,不存在退化情况。下面是一个实际应用的例子:
from collections import defaultdict
# Return ancestor of given node
def ancestor(parent, node):
if parent[node] != node:
# Do path compression
parent[node] = ancestor(parent, parent[node])
return parent[node]
def merge(parent, rank, x, y):
# Merge sets that x & y belong to
x = ancestor(parent, x)
y = ancestor(parent, y)
if x == y:
return
# Union by rank, merge smaller set to larger one
if rank[y] > rank[x]:
x, y = y, x
parent[y] = x
rank[x] += rank[y]
def merge_union(setlist):
# For every word in sets list what sets contain it
words = defaultdict(list)
for i, s in enumerate(setlist):
for w in s:
words[w].append(i)
# Merge sets that share the word
parent = list(range(len(setlist)))
rank = [1] * len(setlist)
for sets in words.values():
it = iter(sets)
merge_to = next(it)
for x in it:
merge(parent, rank, merge_to, x)
# Construct result by union the sets within a component
result = defaultdict(set)
for merge_from, merge_to in enumerate(parent):
result[merge_to] |= setlist[merge_from]
return list(result.values())
setlist = [
{'this', 'is'},
{'is', 'a'},
{'test'},
{'foo'},
{'foobar', 'foo'},
{'foobar', 'bar'},
{'alone'}
]
print(merge_union(setlist))
输出:
[{'this', 'is', 'a'}, {'test'}, {'bar', 'foobar', 'foo'}, {'alone'}]
正如注释中提到的@user2357112,这可以被认为是一个图形问题。每个集合都是顶点,两个集合之间共享的每个单词都是边。然后,您可以在顶点上迭代,并对每个看不见的顶点执行BFS(或DFS)以生成一个新的顶点 另一个选择是使用。联合查找的优点是,不需要构造图,并且当所有集合都具有相同的内容时,不存在退化情况。下面是一个实际应用的例子:
from collections import defaultdict
# Return ancestor of given node
def ancestor(parent, node):
if parent[node] != node:
# Do path compression
parent[node] = ancestor(parent, parent[node])
return parent[node]
def merge(parent, rank, x, y):
# Merge sets that x & y belong to
x = ancestor(parent, x)
y = ancestor(parent, y)
if x == y:
return
# Union by rank, merge smaller set to larger one
if rank[y] > rank[x]:
x, y = y, x
parent[y] = x
rank[x] += rank[y]
def merge_union(setlist):
# For every word in sets list what sets contain it
words = defaultdict(list)
for i, s in enumerate(setlist):
for w in s:
words[w].append(i)
# Merge sets that share the word
parent = list(range(len(setlist)))
rank = [1] * len(setlist)
for sets in words.values():
it = iter(sets)
merge_to = next(it)
for x in it:
merge(parent, rank, merge_to, x)
# Construct result by union the sets within a component
result = defaultdict(set)
for merge_from, merge_to in enumerate(parent):
result[merge_to] |= setlist[merge_from]
return list(result.values())
setlist = [
{'this', 'is'},
{'is', 'a'},
{'test'},
{'foo'},
{'foobar', 'foo'},
{'foobar', 'bar'},
{'alone'}
]
print(merge_union(setlist))
输出:
[{'this', 'is', 'a'}, {'test'}, {'bar', 'foobar', 'foo'}, {'alone'}]
什么是
setlist
?我编辑了这个问题并添加了一个setlist示例,预期的输出是[{'this','is','a'},{'test'}]
。因此,您的预期输出是“this is a test”?有关一系列计时比较和潜在的O(n)算法,请参阅(我自己没有检查逻辑)什么是setlist
?我编辑了这个问题并添加了一个setlist示例,预期的输出是[{'this','is','a'},{'test'}]
。因此,您的预期输出是“this is a test”?有关一系列计时比较和潜在的O(n)算法,请参阅(我自己没有检查逻辑)