Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/345.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 合并共享公共元素的列表_Python_List_Merge_Boolean Expression_Connected Components - Fatal编程技术网

Python 合并共享公共元素的列表

Python 合并共享公共元素的列表,python,list,merge,boolean-expression,connected-components,Python,List,Merge,Boolean Expression,Connected Components,我的输入是一个列表列表。他们中的一些人有共同的因素,例如 L = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']] 我需要合并共享一个公共元素的所有列表,并重复此过程,只要没有其他列表包含同一项。我曾考虑过使用布尔运算和while循环,但没有找到一个好的解决方案 最终结果应该是: L = [['a','b','c','d','e','f','g','o','p'],['k']] 不知道你想

我的输入是一个列表列表。他们中的一些人有共同的因素,例如

L = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]
我需要合并共享一个公共元素的所有列表,并重复此过程,只要没有其他列表包含同一项。我曾考虑过使用布尔运算和while循环,但没有找到一个好的解决方案

最终结果应该是:

L = [['a','b','c','d','e','f','g','o','p'],['k']] 

不知道你想要什么,我决定猜测你的意思:我只想找到每一个元素一次

#!/usr/bin/python


def clink(l, acc):
  for sub in l:
    if sub.__class__ == list:
      clink(sub, acc)
    else:
      acc[sub]=1

def clunk(l):
  acc = {}
  clink(l, acc)
  print acc.keys()

l = [['a', 'b', 'c'], ['b', 'd', 'e'], ['k'], ['o', 'p'], ['e', 'f'], ['p', 'a'], ['d', 'g']]

clunk(l)
输出如下所示:

['a', 'c', 'b', 'e', 'd', 'g', 'f', 'k', 'o', 'p']
算法:

  • 从列表中选择第一组
  • 对于列表中的每个集合B,如果B具有公共元素,且将B连接到A中,则设置B;从列表中删除B
  • 重复2。直到没有更多的重叠
  • 投入
  • 重复1。和名单上的其他人一起
  • 因此,您可能希望使用集合而不是列表。下面的程序应该可以做到这一点

    l = [['a', 'b', 'c'], ['b', 'd', 'e'], ['k'], ['o', 'p'], ['e', 'f'], ['p', 'a'], ['d', 'g']]
    
    out = []
    while len(l)>0:
        first, *rest = l
        first = set(first)
    
        lf = -1
        while len(first)>lf:
            lf = len(first)
    
            rest2 = []
            for r in rest:
                if len(first.intersection(set(r)))>0:
                    first |= set(r)
                else:
                    rest2.append(r)     
            rest = rest2
    
        out.append(first)
        l = rest
    
    print(out)
    

    我认为这可以通过将问题建模为一个整体来解决。每个子列表都是一个节点,仅当两个子列表具有某些共同元素时,才与另一个节点共享一条边。因此,合并的子列表基本上是图中的一个元素。合并所有组件只需找到所有连接的组件并列出它们

    这可以通过在图上进行简单的遍历来完成。和都可以使用,但我在这里使用DFS,因为它对我来说比较短

    l = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]
    taken=[False]*len(l)
    l=[set(elem) for elem in l]
    
    def dfs(node,index):
        taken[index]=True
        ret=node
        for i,item in enumerate(l):
            if not taken[i] and not ret.isdisjoint(item):
                ret.update(dfs(item,i))
        return ret
    
    def merge_all():
        ret=[]
        for i,node in enumerate(l):
            if not taken[i]:
                ret.append(list(dfs(node,i)))
        return ret
    
    print(merge_all())
    

    您可以将列表视为图形的符号,即
    ['a','b','c']
    是一个有3个节点相互连接的图形。你试图解决的问题是找到答案

    您可以使用它,它的优点是几乎可以保证它是正确的:

    l = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]
    
    import networkx 
    from networkx.algorithms.components.connected import connected_components
    
    
    def to_graph(l):
        G = networkx.Graph()
        for part in l:
            # each sublist is a bunch of nodes
            G.add_nodes_from(part)
            # it also imlies a number of edges:
            G.add_edges_from(to_edges(part))
        return G
    
    def to_edges(l):
        """ 
            treat `l` as a Graph and returns it's edges 
            to_edges(['a','b','c','d']) -> [(a,b), (b,c),(c,d)]
        """
        it = iter(l)
        last = next(it)
    
        for current in it:
            yield last, current
            last = current    
    
    G = to_graph(l)
    print connected_components(G)
    # prints [['a', 'c', 'b', 'e', 'd', 'g', 'f', 'o', 'p'], ['k']]
    

    为了自己有效地解决这个问题,您必须将列表转换为图形化的内容,因此您最好从一开始就使用networkX。

    我遇到了相同的问题,即尝试将列表与公共值合并。这个例子可能就是你想要的。 它只在列表上循环一次,并在运行时更新resultset

    lists = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]
    lists = sorted([sorted(x) for x in lists]) #Sorts lists in place so you dont miss things. Trust me, needs to be done.
    
    resultslist = [] #Create the empty result list.
    
    if len(lists) >= 1: # If your list is empty then you dont need to do anything.
        resultlist = [lists[0]] #Add the first item to your resultset
        if len(lists) > 1: #If there is only one list in your list then you dont need to do anything.
            for l in lists[1:]: #Loop through lists starting at list 1
                listset = set(l) #Turn you list into a set
                merged = False #Trigger
                for index in range(len(resultlist)): #Use indexes of the list for speed.
                    rset = set(resultlist[index]) #Get list from you resultset as a set
                    if len(listset & rset) != 0: #If listset and rset have a common value then the len will be greater than 1
                        resultlist[index] = list(listset | rset) #Update the resultlist with the updated union of listset and rset
                        merged = True #Turn trigger to True
                        break #Because you found a match there is no need to continue the for loop.
                if not merged: #If there was no match then add the list to the resultset, so it doesnt get left out.
                    resultlist.append(l)
    print resultlist
    
    #
    我的尝试。有功能性的外观

    #!/usr/bin/python
    from collections import defaultdict
    l = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]
    hashdict = defaultdict(int)
    
    def hashit(x, y):
        for i in y: x[i] += 1
        return x
    
    def merge(x, y):
        sums = sum([hashdict[i] for i in y])
        if sums > len(y):
            x[0] = x[0].union(y)
        else:
            x[1] = x[1].union(y)
        return x
    
    
    hashdict = reduce(hashit, l, hashdict)
    sets = reduce(merge, l, [set(),set()])
    print [list(sets[0]), list(sets[1])]
    
    当您在图形中查找连接的组件时。以下是如何在不使用图形库的情况下实现它:

    from collections import defaultdict
    
    def connected_components(lists):
        neighbors = defaultdict(set)
        seen = set()
        for each in lists:
            for item in each:
                neighbors[item].update(each)
        def component(node, neighbors=neighbors, seen=seen, see=seen.add):
            nodes = set([node])
            next_node = nodes.pop
            while nodes:
                node = next_node()
                see(node)
                nodes |= neighbors[node] - seen
                yield node
        for node in neighbors:
            if node not in seen:
                yield sorted(component(node))
    
    L = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]
    print list(connected_components(L))
    

    这可能是一个更简单/更快的算法,而且似乎工作得很好-

    l = [['a', 'b', 'c'], ['b', 'd', 'e'], ['k'], ['o', 'p'], ['e', 'f'], ['p', 'a'], ['d', 'g']]
    
    len_l = len(l)
    i = 0
    while i < (len_l - 1):
        for j in range(i + 1, len_l):
    
            # i,j iterate over all pairs of l's elements including new 
            # elements from merged pairs. We use len_l because len(l)
            # may change as we iterate
            i_set = set(l[i])
            j_set = set(l[j])
    
            if len(i_set.intersection(j_set)) > 0:
                # Remove these two from list
                l.pop(j)
                l.pop(i)
    
                # Merge them and append to the orig. list
                ij_union = list(i_set.union(j_set))
                l.append(ij_union)
    
                # len(l) has changed
                len_l -= 1
    
                # adjust 'i' because elements shifted
                i -= 1
    
                # abort inner loop, continue with next l[i]
                break
    
        i += 1
    
    print l
    # prints [['k'], ['a', 'c', 'b', 'e', 'd', 'g', 'f', 'o', 'p']]
    
    l=['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]
    len_l=len(l)
    i=0
    而我<(len_l-1):
    对于范围内的j(i+1,len_l):
    #i,j迭代所有l元素对,包括new
    #合并对中的元素。我们使用len_l是因为len(l)
    #可能会随着我们的迭代而改变
    i_set=set(l[i])
    j_集=集(l[j])
    如果len(i_集.交集(j_集))>0:
    #从列表中删除这两个
    l、 流行音乐(j)
    l、 流行音乐(一)
    #合并它们并附加到源文件。列表
    ij_并集=列表(i_集.并集(j_集))
    l、 附加(ij_联合)
    #莱恩(左)变了
    len_l-=1
    #调整“i”,因为元素已移动
    i-=1
    #中止内部循环,继续下一个l[i]
    打破
    i+=1
    打印l
    #印刷品['k'],['a',c',b',e',d',g',f',o',p']]
    
    我发现itertools是合并列表的快速选项,它为我解决了这个问题:

    import itertools
    
    LL = set(itertools.chain.from_iterable(L)) 
    # LL is {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'k', 'o', 'p'}
    
    for each in LL:
      components = [x for x in L if each in x]
      for i in components:
        L.remove(i)
      L += [list(set(itertools.chain.from_iterable(components)))]
    
    # then L = [['k'], ['a', 'c', 'b', 'e', 'd', 'g', 'f', 'o', 'p']]
    

    对于大型集合,按频率将LL从最常见的元素排序到最不常见的元素可以稍微加快速度

    我需要对相当大的列表执行OP数百万次描述的聚类技术,因此想要确定上面建议的哪种方法是最准确和最有效的

    对于上述每个方法,我使用相同的输入列表,对大小从2^1到2^10的输入列表进行了10次试验,并测量了上述每个算法的平均运行时间(毫秒)。结果如下:

    这些结果帮助我看到,在能够始终返回正确结果的方法中,@jochen's是最快的。在那些无法始终返回正确结果的方法中,mak的解决方案通常不包括所有输入元素(即缺少列表成员列表),braaksma、cmangla和asterisk的解决方案也不能保证最大程度地合并

    有趣的是,两个最快、正确的算法以正确的顺序拥有最新的两个投票量

    以下是用于运行测试的代码:

    from networkx.algorithms.components.connected import connected_components
    from itertools import chain
    from random import randint, random
    from collections import defaultdict, deque
    from copy import deepcopy
    from multiprocessing import Pool
    import networkx
    import datetime
    import os
    
    ##
    # @mimomu
    ##
    
    def mimomu(l):
      l = deepcopy(l)
      s = set(chain.from_iterable(l))
      for i in s:
        components = [x for x in l if i in x]
        for j in components:
          l.remove(j)
        l += [list(set(chain.from_iterable(components)))]
      return l
    
    ##
    # @Howard
    ##
    
    def howard(l):
      out = []
      while len(l)>0:
          first, *rest = l
          first = set(first)
    
          lf = -1
          while len(first)>lf:
              lf = len(first)
    
              rest2 = []
              for r in rest:
                  if len(first.intersection(set(r)))>0:
                      first |= set(r)
                  else:
                      rest2.append(r)
              rest = rest2
    
          out.append(first)
          l = rest
      return out
    
    ##
    # Nx @Jochen Ritzel
    ##
    
    def jochen(l):
      l = deepcopy(l)
    
      def to_graph(l):
          G = networkx.Graph()
          for part in l:
              # each sublist is a bunch of nodes
              G.add_nodes_from(part)
              # it also imlies a number of edges:
              G.add_edges_from(to_edges(part))
          return G
    
      def to_edges(l):
          """
              treat `l` as a Graph and returns it's edges
              to_edges(['a','b','c','d']) -> [(a,b), (b,c),(c,d)]
          """
          it = iter(l)
          last = next(it)
    
          for current in it:
              yield last, current
              last = current
    
      G = to_graph(l)
      return list(connected_components(G))
    
    ##
    # Merge all @MAK
    ##
    
    def mak(l):
      l = deepcopy(l)
      taken=[False]*len(l)
      l=map(set,l)
    
      def dfs(node,index):
          taken[index]=True
          ret=node
          for i,item in enumerate(l):
              if not taken[i] and not ret.isdisjoint(item):
                  ret.update(dfs(item,i))
          return ret
    
      def merge_all():
          ret=[]
          for i,node in enumerate(l):
              if not taken[i]:
                  ret.append(list(dfs(node,i)))
          return ret
    
      result = list(merge_all())
      return result
    
    ##
    # @cmangla
    ##
    
    def cmangla(l):
      l = deepcopy(l)
      len_l = len(l)
      i = 0
      while i < (len_l - 1):
        for j in range(i + 1, len_l):
          # i,j iterate over all pairs of l's elements including new
          # elements from merged pairs. We use len_l because len(l)
          # may change as we iterate
          i_set = set(l[i])
          j_set = set(l[j])
    
          if len(i_set.intersection(j_set)) > 0:
            # Remove these two from list
            l.pop(j)
            l.pop(i)
    
            # Merge them and append to the orig. list
            ij_union = list(i_set.union(j_set))
            l.append(ij_union)
    
            # len(l) has changed
            len_l -= 1
    
            # adjust 'i' because elements shifted
            i -= 1
    
            # abort inner loop, continue with next l[i]
            break
    
          i += 1
      return l
    
    ##
    # @pillmuncher
    ##
    
    def pillmuncher(l):
      l = deepcopy(l)
    
      def connected_components(lists):
        neighbors = defaultdict(set)
        seen = set()
        for each in lists:
            for item in each:
                neighbors[item].update(each)
        def component(node, neighbors=neighbors, seen=seen, see=seen.add):
            nodes = set([node])
            next_node = nodes.pop
            while nodes:
                node = next_node()
                see(node)
                nodes |= neighbors[node] - seen
                yield node
        for node in neighbors:
            if node not in seen:
                yield sorted(component(node))
    
      return list(connected_components(l))
    
    ##
    # @NicholasBraaksma
    ##
    
    def braaksma(l):
      l = deepcopy(l)
      lists = sorted([sorted(x) for x in l]) #Sorts lists in place so you dont miss things. Trust me, needs to be done.
    
      resultslist = [] #Create the empty result list.
    
      if len(lists) >= 1: # If your list is empty then you dont need to do anything.
          resultlist = [lists[0]] #Add the first item to your resultset
          if len(lists) > 1: #If there is only one list in your list then you dont need to do anything.
              for l in lists[1:]: #Loop through lists starting at list 1
                  listset = set(l) #Turn you list into a set
                  merged = False #Trigger
                  for index in range(len(resultlist)): #Use indexes of the list for speed.
                      rset = set(resultlist[index]) #Get list from you resultset as a set
                      if len(listset & rset) != 0: #If listset and rset have a common value then the len will be greater than 1
                          resultlist[index] = list(listset | rset) #Update the resultlist with the updated union of listset and rset
                          merged = True #Turn trigger to True
                          break #Because you found a match there is no need to continue the for loop.
                  if not merged: #If there was no match then add the list to the resultset, so it doesnt get left out.
                      resultlist.append(l)
      return resultlist
    
    ##
    # @Rumple Stiltskin
    ##
    
    def stiltskin(l):
      l = deepcopy(l)
      hashdict = defaultdict(int)
    
      def hashit(x, y):
          for i in y: x[i] += 1
          return x
    
      def merge(x, y):
          sums = sum([hashdict[i] for i in y])
          if sums > len(y):
              x[0] = x[0].union(y)
          else:
              x[1] = x[1].union(y)
          return x
    
      hashdict = reduce(hashit, l, hashdict)
      sets = reduce(merge, l, [set(),set()])
      return list(sets)
    
    ##
    # @Asterisk
    ##
    
    def asterisk(l):
      l = deepcopy(l)
      results = {}
      for sm in ['min', 'max']:
        sort_method = min if sm == 'min' else max
        l = sorted(l, key=lambda x:sort_method(x))
        queue = deque(l)
    
        grouped = []
        while len(queue) >= 2:
          l1 = queue.popleft()
          l2 = queue.popleft()
          s1 = set(l1)
          s2 = set(l2)
    
          if s1 & s2:
            queue.appendleft(s1 | s2)
          else:
            grouped.append(s1)
            queue.appendleft(s2)
        if queue:
          grouped.append(queue.pop())
        results[sm] = grouped
      if len(results['min']) < len(results['max']):
        return results['min']
      return results['max']
    
    ##
    # Validate no more clusters can be merged
    ##
    
    def validate(output, L):
      # validate all sublists are maximally merged
      d = defaultdict(list)
      for idx, i in enumerate(output):
        for j in i:
          d[j].append(i)
      if any([len(i) > 1 for i in d.values()]):
        return 'not maximally merged'
      # validate all items in L are accounted for
      all_items = set(chain.from_iterable(L))
      accounted_items = set(chain.from_iterable(output))
      if all_items != accounted_items:
        return 'missing items'
      # validate results are good
      return 'true'
    
    ##
    # Timers
    ##
    
    def time(func, L):
      start = datetime.datetime.now()
      result = func(L)
      delta = datetime.datetime.now() - start
      return result, delta
    
    ##
    # Function runner
    ##
    
    def run_func(args):
      func, L, input_size = args
      results, elapsed = time(func, L)
      validation_result = validate(results, L)
      return func.__name__, input_size, elapsed, validation_result
    
    ##
    # Main
    ##
    
    all_results = defaultdict(lambda: defaultdict(list))
    funcs = [mimomu, howard, jochen, mak, cmangla, braaksma, asterisk]
    args = []
    
    for trial in range(10):
      for s in range(10):
        input_size = 2**s
    
        # get some random inputs to use for all trials at this size
        L = []
        for i in range(input_size):
          sublist = []
          for j in range(randint(5, 10)):
            sublist.append(randint(0, 2**24))
          L.append(sublist)
        for i in funcs:
          args.append([i, L, input_size])
    
    pool = Pool()
    for result in pool.imap(run_func, args):
      func_name, input_size, elapsed, validation_result = result
      all_results[func_name][input_size].append({
        'time': elapsed,
        'validation': validation_result,
      })
      # show the running time for the function at this input size
      print(input_size, func_name, elapsed, validation_result)
    pool.close()
    pool.join()
    
    # write the average of time trials at each size for each function
    with open('times.tsv', 'w') as out:
      for func in all_results:
        validations = [i['validation'] for j in all_results[func] for i in all_results[func][j]]
        linetype = 'incorrect results' if any([i != 'true' for i in validations]) else 'correct results'
    
        for input_size in all_results[func]:
          all_times = [i['time'].microseconds for i in all_results[func][input_size]]
          avg_time = sum(all_times) / len(all_times)
    
          out.write(func + '\t' + str(input_size) + '\t' + \
            str(avg_time) + '\t' + linetype + '\n')
    
    来自networkx.algorithms.components.connected导入连接的\u组件
    来自itertools进口链
    从随机导入randint,随机
    从集合导入defaultdict,deque
    从复制导入deepcopy
    来自多处理导入池
    导入networkx
    导入日期时间
    导入操作系统
    ##
    #@mimomu
    ##
    迪夫米莫姆(左):
    l=深度复制(l)
    s=集合(链自可数(l))
    对于s中的i:
    组件=[x代表l中的x,如果i代表x]
    对于部件中的j:
    l、 移除(j)
    l+=[列表(集合(链从可编辑(组件)))]
    返回l
    ##
    #@Howard
    ##
    霍华德(左):
    out=[]
    当len(l)>0时:
    首先,*rest=l
    第一个=设置(第一个)
    lf=-1
    而len(first)>lf:
    lf=len(第一个)
    rest2=[]
    对于静止的r:
    如果len(第一个交叉点(集合(r)))>0:
    第一个|=集合(r)
    其他:
    rest2.append(r)
    rest=rest2
    out.append(第一个)
    l=休息
    返回
    ##
    #Nx@Jochen Ritzel
    ##
    戴夫·乔森(左):
    l=深度复制(l)
    def到_图(l):
    G=networkx.Graph()
    对于l部分:
    #每个子列表都是一组节点
    G.从(零件)添加节点
    #它还包含许多边:
    G.将边从添加到边(部分))
    返回G
    def至_边缘(l):
    """
    将'l'视为图并返回其边
    到_边(['a','b','c','d'])->[(a,b),(b,c),(c,d)]
    """
    it=国际热核实验堆(l)
    最后一个=下一个(it)
    对于it中的电流:
    产镧
    
    from networkx.algorithms.components.connected import connected_components
    from itertools import chain
    from random import randint, random
    from collections import defaultdict, deque
    from copy import deepcopy
    from multiprocessing import Pool
    import networkx
    import datetime
    import os
    
    ##
    # @mimomu
    ##
    
    def mimomu(l):
      l = deepcopy(l)
      s = set(chain.from_iterable(l))
      for i in s:
        components = [x for x in l if i in x]
        for j in components:
          l.remove(j)
        l += [list(set(chain.from_iterable(components)))]
      return l
    
    ##
    # @Howard
    ##
    
    def howard(l):
      out = []
      while len(l)>0:
          first, *rest = l
          first = set(first)
    
          lf = -1
          while len(first)>lf:
              lf = len(first)
    
              rest2 = []
              for r in rest:
                  if len(first.intersection(set(r)))>0:
                      first |= set(r)
                  else:
                      rest2.append(r)
              rest = rest2
    
          out.append(first)
          l = rest
      return out
    
    ##
    # Nx @Jochen Ritzel
    ##
    
    def jochen(l):
      l = deepcopy(l)
    
      def to_graph(l):
          G = networkx.Graph()
          for part in l:
              # each sublist is a bunch of nodes
              G.add_nodes_from(part)
              # it also imlies a number of edges:
              G.add_edges_from(to_edges(part))
          return G
    
      def to_edges(l):
          """
              treat `l` as a Graph and returns it's edges
              to_edges(['a','b','c','d']) -> [(a,b), (b,c),(c,d)]
          """
          it = iter(l)
          last = next(it)
    
          for current in it:
              yield last, current
              last = current
    
      G = to_graph(l)
      return list(connected_components(G))
    
    ##
    # Merge all @MAK
    ##
    
    def mak(l):
      l = deepcopy(l)
      taken=[False]*len(l)
      l=map(set,l)
    
      def dfs(node,index):
          taken[index]=True
          ret=node
          for i,item in enumerate(l):
              if not taken[i] and not ret.isdisjoint(item):
                  ret.update(dfs(item,i))
          return ret
    
      def merge_all():
          ret=[]
          for i,node in enumerate(l):
              if not taken[i]:
                  ret.append(list(dfs(node,i)))
          return ret
    
      result = list(merge_all())
      return result
    
    ##
    # @cmangla
    ##
    
    def cmangla(l):
      l = deepcopy(l)
      len_l = len(l)
      i = 0
      while i < (len_l - 1):
        for j in range(i + 1, len_l):
          # i,j iterate over all pairs of l's elements including new
          # elements from merged pairs. We use len_l because len(l)
          # may change as we iterate
          i_set = set(l[i])
          j_set = set(l[j])
    
          if len(i_set.intersection(j_set)) > 0:
            # Remove these two from list
            l.pop(j)
            l.pop(i)
    
            # Merge them and append to the orig. list
            ij_union = list(i_set.union(j_set))
            l.append(ij_union)
    
            # len(l) has changed
            len_l -= 1
    
            # adjust 'i' because elements shifted
            i -= 1
    
            # abort inner loop, continue with next l[i]
            break
    
          i += 1
      return l
    
    ##
    # @pillmuncher
    ##
    
    def pillmuncher(l):
      l = deepcopy(l)
    
      def connected_components(lists):
        neighbors = defaultdict(set)
        seen = set()
        for each in lists:
            for item in each:
                neighbors[item].update(each)
        def component(node, neighbors=neighbors, seen=seen, see=seen.add):
            nodes = set([node])
            next_node = nodes.pop
            while nodes:
                node = next_node()
                see(node)
                nodes |= neighbors[node] - seen
                yield node
        for node in neighbors:
            if node not in seen:
                yield sorted(component(node))
    
      return list(connected_components(l))
    
    ##
    # @NicholasBraaksma
    ##
    
    def braaksma(l):
      l = deepcopy(l)
      lists = sorted([sorted(x) for x in l]) #Sorts lists in place so you dont miss things. Trust me, needs to be done.
    
      resultslist = [] #Create the empty result list.
    
      if len(lists) >= 1: # If your list is empty then you dont need to do anything.
          resultlist = [lists[0]] #Add the first item to your resultset
          if len(lists) > 1: #If there is only one list in your list then you dont need to do anything.
              for l in lists[1:]: #Loop through lists starting at list 1
                  listset = set(l) #Turn you list into a set
                  merged = False #Trigger
                  for index in range(len(resultlist)): #Use indexes of the list for speed.
                      rset = set(resultlist[index]) #Get list from you resultset as a set
                      if len(listset & rset) != 0: #If listset and rset have a common value then the len will be greater than 1
                          resultlist[index] = list(listset | rset) #Update the resultlist with the updated union of listset and rset
                          merged = True #Turn trigger to True
                          break #Because you found a match there is no need to continue the for loop.
                  if not merged: #If there was no match then add the list to the resultset, so it doesnt get left out.
                      resultlist.append(l)
      return resultlist
    
    ##
    # @Rumple Stiltskin
    ##
    
    def stiltskin(l):
      l = deepcopy(l)
      hashdict = defaultdict(int)
    
      def hashit(x, y):
          for i in y: x[i] += 1
          return x
    
      def merge(x, y):
          sums = sum([hashdict[i] for i in y])
          if sums > len(y):
              x[0] = x[0].union(y)
          else:
              x[1] = x[1].union(y)
          return x
    
      hashdict = reduce(hashit, l, hashdict)
      sets = reduce(merge, l, [set(),set()])
      return list(sets)
    
    ##
    # @Asterisk
    ##
    
    def asterisk(l):
      l = deepcopy(l)
      results = {}
      for sm in ['min', 'max']:
        sort_method = min if sm == 'min' else max
        l = sorted(l, key=lambda x:sort_method(x))
        queue = deque(l)
    
        grouped = []
        while len(queue) >= 2:
          l1 = queue.popleft()
          l2 = queue.popleft()
          s1 = set(l1)
          s2 = set(l2)
    
          if s1 & s2:
            queue.appendleft(s1 | s2)
          else:
            grouped.append(s1)
            queue.appendleft(s2)
        if queue:
          grouped.append(queue.pop())
        results[sm] = grouped
      if len(results['min']) < len(results['max']):
        return results['min']
      return results['max']
    
    ##
    # Validate no more clusters can be merged
    ##
    
    def validate(output, L):
      # validate all sublists are maximally merged
      d = defaultdict(list)
      for idx, i in enumerate(output):
        for j in i:
          d[j].append(i)
      if any([len(i) > 1 for i in d.values()]):
        return 'not maximally merged'
      # validate all items in L are accounted for
      all_items = set(chain.from_iterable(L))
      accounted_items = set(chain.from_iterable(output))
      if all_items != accounted_items:
        return 'missing items'
      # validate results are good
      return 'true'
    
    ##
    # Timers
    ##
    
    def time(func, L):
      start = datetime.datetime.now()
      result = func(L)
      delta = datetime.datetime.now() - start
      return result, delta
    
    ##
    # Function runner
    ##
    
    def run_func(args):
      func, L, input_size = args
      results, elapsed = time(func, L)
      validation_result = validate(results, L)
      return func.__name__, input_size, elapsed, validation_result
    
    ##
    # Main
    ##
    
    all_results = defaultdict(lambda: defaultdict(list))
    funcs = [mimomu, howard, jochen, mak, cmangla, braaksma, asterisk]
    args = []
    
    for trial in range(10):
      for s in range(10):
        input_size = 2**s
    
        # get some random inputs to use for all trials at this size
        L = []
        for i in range(input_size):
          sublist = []
          for j in range(randint(5, 10)):
            sublist.append(randint(0, 2**24))
          L.append(sublist)
        for i in funcs:
          args.append([i, L, input_size])
    
    pool = Pool()
    for result in pool.imap(run_func, args):
      func_name, input_size, elapsed, validation_result = result
      all_results[func_name][input_size].append({
        'time': elapsed,
        'validation': validation_result,
      })
      # show the running time for the function at this input size
      print(input_size, func_name, elapsed, validation_result)
    pool.close()
    pool.join()
    
    # write the average of time trials at each size for each function
    with open('times.tsv', 'w') as out:
      for func in all_results:
        validations = [i['validation'] for j in all_results[func] for i in all_results[func][j]]
        linetype = 'incorrect results' if any([i != 'true' for i in validations]) else 'correct results'
    
        for input_size in all_results[func]:
          all_times = [i['time'].microseconds for i in all_results[func][input_size]]
          avg_time = sum(all_times) / len(all_times)
    
          out.write(func + '\t' + str(input_size) + '\t' + \
            str(avg_time) + '\t' + linetype + '\n')
    
    library(ggplot2)
    df <- read.table('times.tsv', sep='\t')
    
    p <- ggplot(df, aes(x=V2, y=V3, color=as.factor(V1))) +
      geom_line() +
      xlab('number of input lists') +
      ylab('runtime (ms)') +
      labs(color='') +
      scale_x_continuous(trans='log10') +
      facet_wrap(~V4, ncol=1)
    
    ggsave('runtimes.png')
    
       def merge_overlapping_sublists(lst):
        output, refs = {}, {}
        for index, sublist in enumerate(lst):
            output[index] = set(sublist)
            for elem in sublist:
                refs[elem] = index
        changes = True
        while changes:
            changes = False
            for ref_num, sublist in list(output.items()):
                for elem in sublist:
                    current_ref_num = refs[elem]
                    if current_ref_num != ref_num:
                        changes = True
                        output[current_ref_num] |= sublist
                        for elem2 in sublist:
                            refs[elem2] = current_ref_num
                        output.pop(ref_num)
                        break
        return list(output.values())
    
    def compare(a, b):
        a = list(b)
        try:
            for elem in a:
                b.remove(elem)
        except ValueError:
            return False
        return not b
    
    import random
    lst = [["a", "b"], ["b", "c"], ["c", "d"], ["d", "e"]]
    random.shuffle(lst)
    assert compare(merge_overlapping_sublists(lst), [{"a", "b", "c", "d", "e"}])
    lst = [["a", "b"], ["b", "c"], ["f", "d"], ["d", "e"]]
    random.shuffle(lst)
    assert compare(merge_overlapping_sublists(lst), [{"a", "b", "c",}, {"d", "e", "f"}])
    lst = [["a", "b"], ["k", "c"], ["f", "g"], ["d", "e"]]
    random.shuffle(lst)
    assert compare(merge_overlapping_sublists(lst), [{"a", "b"}, {"k", "c"}, {"f", "g"}, {"d", "e"}])
    lst = [["a", "b", "c"], ["b", "d", "e"], ["k"], ["o", "p"], ["e", "f"], ["p", "a"], ["d", "g"]]
    random.shuffle(lst)
    assert compare(merge_overlapping_sublists(lst), [{"k"}, {"a", "c", "b", "e", "d", "g", "f", "o", "p"}])    
    lst = [["a", "b"], ["b", "c"], ["a"], ["a"], ["b"]]
    random.shuffle(lst)
    assert compare(merge_overlapping_sublists(lst), [{"a", "b", "c"}])
    
    #your list
    l=[['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]
    
    #import itertools
    from itertools import product, groupby
    
    #inner lists to sets (to list of sets)
    l=[set(x) for x in l]
    
    #cartesian product merging elements if some element in common
    for a,b in product(l,l):
        if a.intersection( b ):
           a.update(b)
           b.update(a)
    
    #back to list of lists
    l = sorted( [sorted(list(x)) for x in l])
    
    #remove dups
    list(l for l,_ in groupby(l))
    
    #result
    [['a', 'b', 'c', 'd', 'e', 'f', 'g', 'o', 'p'], ['k']]
    
    import networkx as nx
    
    L = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]
    
    G = nx.Graph()
    
    #Add nodes to Graph    
    G.add_nodes_from(sum(L, []))
    
    #Create edges from list of nodes
    q = [[(s[i],s[i+1]) for i in range(len(s)-1)] for s in L]
    
    for i in q:
    
        #Add edges to Graph
        G.add_edges_from(i)
    
    #Find all connnected components in graph and list nodes for each component
    [list(i) for i in nx.connected_components(G)]
    
    [['p', 'c', 'f', 'g', 'o', 'a', 'd', 'b', 'e'], ['k']]
    
    def cluser_combine(groups):
        n_groups=len(groups)
    
        #first, we put all elements appeared in 'gruops' into 'elements'.
        elements=list(set.union(*[set(g) for g in groups]))
        #and sort elements.
        elements.sort()
        n_elements=len(elements)
    
        #I create a list called clusters, this is the key of this algorithm.
        #I was inspired by sklearn kmeans implementation.
        #they have an attribute called labels_
        #the url is here:
        #https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
        #i called this algorithm cluster combine, because of this inspiration.
        labels=list(range(n_elements))
    
    
        #for each group, I get their 'indices' in 'elements'
        #I then get the labels for indices.
        #and i calculate the min of the labels, that will be the new label for them.
        #I replace all elements with labels in labels_for_group with the new label.
    
        #or to say, for each iteration,
        #i try to combine two or more existing groups.
        #if the group has labels of 0 and 2
        #i find out the new label 0, that is the min of the two.
        #i than replace them with 0.
        for i in range(n_groups):
    
            #if there is only zero/one element in the group, skip
            if len(groups[i])<=1:
                continue
    
            indices=list(map(elements.index, groups[i]))
    
            labels_for_group=list(set([labels[i] for i in indices]))
            #if their is only one label, all the elements in group are already have the same label, skip.
            if len(labels_for_group)==1:
    
                continue
    
            labels_for_group.sort()
            label=labels_for_group[0]
    
            #combine
            for k in range(n_elements):
                if labels[k] in labels_for_group[1:]:
                    labels[k]=label
    
    
        new_groups=[]
        for label in set(labels):
            new_group = [elements[i] for i, v in enumerate(labels) if v == label]
            new_groups.append(new_group)
    
        return new_groups
    
    cluser_combine([['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']])
    
    orig = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g'], ['k'],['k'],['k']]
    
    def merge_lists(orig):
        def step(orig): 
            mid = []
            mid.append(orig[0])
            for i in range(len(mid)):            
                for j in range(1,len(orig)):                
                    for k in orig[j]:
                        if k in mid[i]:                
                            mid[i].extend(orig[j])                
                            break
                        elif k == orig[j][-1] and orig[j] not in mid:
                            mid.append(orig[j])                        
            mid = [sorted(list(set(x))) for x in mid]
            return mid
    
        result = step(orig)
        while result != step(result):                    
            result = step(result)                  
        return result
    
    merge_lists(orig)
    [['a', 'b', 'c', 'd', 'e', 'f', 'g', 'o', 'p'], ['k']]