Python 基于共享值的列表的群集列表_Python_Algorithm

Python 基于共享值的列表的群集列表

python algorithm

Python 基于共享值的列表的群集列表,python,algorithm,Python,Algorithm,我有一个列表，其中每个子列表包含一些整数： o = [[1,2],[3,4],[2,3],[5,4]] 我想创建一个新的列表列表，其中o中共享一个公共成员的任何两个子列表都将被合并。这个合并过程应该继续，直到没有两个子列表共享一个公共元素为止。给定o，我们将[1,2]与[2,3]合并，因为它们共享一个2，然后我们将该组与[3,4]合并，因为[1,2,3]和[3,4]都包含一个3，依此类推聚类o的预期输出将是[[1,2,3,4,5]] 我有一种预感，这项任务有一种方法远远优于我目前的方法（见下

我有一个列表，其中每个子列表包含一些整数：

o = [[1,2],[3,4],[2,3],[5,4]]

我想创建一个新的列表列表，其中

中共享一个公共成员的任何两个子列表都将被合并。这个合并过程应该继续，直到没有两个子列表共享一个公共元素为止。给定

，我们将

[1,2]

与

[2,3]

合并，因为它们共享一个2，然后我们将该组与

[3,4]

合并，因为

[1,2,3]

和

[3,4]

都包含一个3，依此类推

聚类

的预期输出将是

[[1,2,3,4,5]]

我有一种预感，这项任务有一种方法远远优于我目前的方法（见下文）。如果其他人能就完成这项任务的最有效方式（时间、空间）提出任何建议，我们将不胜感激

from collections import defaultdict

o = [[1,2],[3,4],[2,3],[5,4]]

def group_lists(list_of_lists):
  '''
  Given a list of lists, continue combining sublist
  elements that share an element until no two sublist
  items share an element.
  '''
  to_cluster = set(tuple(i) for i in list_of_lists)
  keep_clustering = True
  while keep_clustering:
    keep_clustering = False
    d = defaultdict(set)
    for i in to_cluster:
      for j in i:
        d[j].add(i)
    clustered = set()
    for i in d.values():
      # remove duplicate entries from each cluster
      flat = tuple(set([item for sublist in i for item in sublist]))
      clustered.add(flat)
    if not to_cluster == clustered:
      keep_clustering = True
      to_cluster = clustered
  # done clustering!
  return clustered

print(group_lists(o))

您可以使用递归：

def cluster(d, current = []):
  options = [i for i in d if any(c in current for c in i)]
  _flattened = [i for b in options for i in b]
  d = list(filter(lambda x:x not in options, d))
  if not options or not d:
    yield current+_flattened
  if d and not options:
    yield from cluster(d[1:], d[0])
  elif d:
    yield from cluster(d, current+_flattened)

for a, *b in [[[1,2],[6,4],[2,3],[5,4]], [[1,2],[3,4],[2,3],[5,4]], [[1,2],[3,4],[2,3],[5,4], [10, 11, 12], [13, 15], [4,6], [6, 8], [23,25]]]:
  print([list(set(i)) for i in cluster(b, a)])

输出：

[[1, 2, 3], [4, 5, 6]]
[[1, 2, 3, 4, 5]]
[[1, 2, 3, 4, 5, 6, 8], [10, 11, 12], [13, 15], [25, 23]]

您可以使用递归：

def cluster(d, current = []):
  options = [i for i in d if any(c in current for c in i)]
  _flattened = [i for b in options for i in b]
  d = list(filter(lambda x:x not in options, d))
  if not options or not d:
    yield current+_flattened
  if d and not options:
    yield from cluster(d[1:], d[0])
  elif d:
    yield from cluster(d, current+_flattened)

for a, *b in [[[1,2],[6,4],[2,3],[5,4]], [[1,2],[3,4],[2,3],[5,4]], [[1,2],[3,4],[2,3],[5,4], [10, 11, 12], [13, 15], [4,6], [6, 8], [23,25]]]:
  print([list(set(i)) for i in cluster(b, a)])

输出：

[[1, 2, 3], [4, 5, 6]]
[[1, 2, 3, 4, 5]]
[[1, 2, 3, 4, 5, 6, 8], [10, 11, 12], [13, 15], [25, 23]]

输出

[set([1, 2, 3, 4, 5, 6, 8]), set([10, 11, 12]), set([13, 15]), set([25, 23])]

输出

[set([1, 2, 3, 4, 5, 6, 8]), set([10, 11, 12]), set([13, 15]), set([25, 23])]

此算法返回不正确的结果。请尝试

o=[[1,2]，[6,4]，[2,3]，[5,4]]

。@duhaime请查看我最近的编辑。此算法返回不正确的结果。试试

o=[[1,2]，[6,4]，[2,3]，[5,4]

。@duhaime请看我最近的编辑。

o=[[1,4]，[2,3]，[4,5]

结果到

[{1,4}，{2,3}，{4,5}]

@niemmi你说得对。此输入失败。我想一个简单的解决办法是按最大值排序，看看结果是否与按最小值排序不同。如果是这样，则使用较少项目的结果。无论如何，您链接的线程的基于图形的解决方案要好得多。@Asterisk即使有您提到的更改，这种技术也不会最大限度地合并组。我将发布一些测试，您可以在该线程中运行几分钟…@duhaime我同意。谢谢你让我知道。

o=[[1,4]，[2,3]，[4,5]

结果到

[{1,4}，{2,3}，{4,5}]

@niemmi你是对的。此输入失败。我想一个简单的解决办法是按最大值排序，看看结果是否与按最小值排序不同。如果是这样，则使用较少项目的结果。无论如何，您链接的线程的基于图形的解决方案要好得多。@Asterisk即使有您提到的更改，这种技术也不会最大限度地合并组。我将发布一些测试，您可以在该线程中运行几分钟…@duhaime我同意。谢谢你让我知道。