Python 从嵌套列表返回叶节点的路径列表

Python 从嵌套列表返回叶节点的路径列表,python,list,tree,Python,List,Tree,我有一个动态树结构,表示为列表列表-下面是一个这样的示例,用空格来说明结构: [['first', [0, 'list1'], [1, 'list2'], [2, 'list3']], ['second', ['second_subda', [0, 'tup1'],

我有一个动态树结构,表示为列表列表-下面是一个这样的示例,用空格来说明结构:

[['first', [0, 'list1'], [1, 'list2'], [2, 'list3']], ['second', ['second_subda', [0, 'tup1'], [1, 'tup2']], ['second_subdb', [0, 'tup3'], [1, 'tup4']]], ['third', ['third_subda', [0, 'a'], [1, 'b'], [2, 'c'], [3, ['d', [0, 'e'], [1, 'f'], [2, ['g', [0, 1], [1, 2], [2, 3]]]]]]]] [['第一', [0,'列表1'], [1,‘列表2’], [2,'列表3']], [“秒”, [“第二次”, [0,'tup1'], [1,'tup2']], ['second_subdb', [0,'tup3'], [1,'tup4']]], [“第三”, [“第三次”, [0,'a'], [1,‘b’], [2,'c'], [3, ['d', [0,'e'], [1,'f'], [2, ['g', [0, 1], [1, 2], [2, 3]]]]]]]] 我想从中提取所有叶节点,以及到达它们所需的路径:

e、 g.从上述结构中,我想返回:

[ ( 'list1', ['first', 0 ] ) , ( 'list2', ['first', 1 ] ) , ( 'list3', ['first', 2 ] ) , ( 'tup1' , ['second', 'second_subda', 0 ] ) , ( 'tup2' , ['second', 'second_subda', 1 ] ) , ( 'tup3' , ['second', 'second_subdb', 0 ] ) , ( 'tup4' , ['second', 'second_subdb', 1 ] ) , ( 'a' , ['third', 'third_subda', 0 ] ) , ( 'b' , ['third', 'third_subda', 1 ] ) , ( 'c' , ['third', 'third_subda', 2 ] ) , ( 'e' , ['third', 'third_subda', 3 , 'd', 0 ] ) , ( 'f' , ['third', 'third_subda', 3 , 'd', 1 ] ) , ( 1 , ['third', 'third_subda', 3 , 'd', 2 , 'g' , 0 ] ) , ( 2 , ['third', 'third_subda', 3 , 'd', 2 , 'g' , 1 ] ) , ( 3 , ['third', 'third_subda', 3 , 'd', 2 , 'g' , 2 ] )] [('list1',['first',0]), ('list2',['first',1]), ('list3',['first',2]), ('tup1',['second','second_subda',0]), ('tup2',['second','second_subda',1]), ('tup3',['second','second_subdb',0]), ('tup4',['second','second_subdb',1]), ('a',['third','third_subda',0]), ('b',['third','third_subda',1]), (‘c’、[‘third’、‘third_subda’、2]), ('e',['third','third_subda',3',d',0]), (‘f’、[‘third’、‘third_subda’、3、‘d’、1]), (1,['third','third_subda',3,'d',2,'g',0]), (2,['third','third_subda',3,'d',2,'g',1]), (3,['third','third_subda',3,'d',2,'g',2])] i、 e.对于每个“叶”,我想提取一个元组,由所有叶值和所有初始列表项组成,这些初始列表项描述了到达该叶项的唯一路径。我应该留下这些元组的列表,其中列表中的项目数对应于树中的叶节点数


我曾尝试在类似
networkx
的模块中构建此树,但对于我的用例来说,额外模块的开销太大了。我只想尽可能坚持使用普通的python代码

首先,如果可以的话,使用一个dict of dict而不是一个列表。字典具有恒定的键查找时间,而列表具有线性查找时间

关于你的问题,每当你处理动态树时,递归通常是一种方法

这适用于您的树:

def get_leaf_paths(children: list, path_prefix:list=[], acc:list=[]):
    for child in children:
        path = path_prefix + [child[0]]
        if isinstance(child[1], list):
            get_leaf_paths(child[1:], path, acc)
        else:
            acc.append(
                (child[1], path)
            )
    return acc

get_leaf_paths(tree)
然而,这是丑陋的,而且有很好的理由。Python不希望在dict结构更适合的情况下实现这样的树。例如,通过索引(
child[1]
)引用叶值是不可取的,同时在同一列表中包含节点名和子节点也是有问题的(导致对子节点进行
child[1:://code>迭代,这不是描述性的)。在好的python代码中也要避免调用
isinstance
,但是我们需要在这里使用它来检查是否有一个叶子

最佳实践规定,叶应该是
None
作为子节点的节点-这使得检查叶状态更容易。如果我们使用dict of dicts和
None
子对象来实现相同的功能,则函数将清理为:

def get_leaf_paths_dict(tree: dict, path=[], acc=[]):
    for node, children in tree.items():
        if children: # not leaf
            get_leaf_paths(children, path + [node], acc)
        else:
            acc.append((node, path))
    return acc

get_leaf_paths_2(tree)
这本书读起来好多了。为了清楚起见,第二个要工作,必须将树更改为dict of dict,即:

{{'first':  {0: {'list1': None}, 
             1: {'list2': None}, 
             2: {'list3': None},
 {'second': { ... etc.
另一方面,如果您像这样构建树,您可以使用函数
nx.from\u dict\u of \u dicts
将其导入Networkx,并从那里执行Networkx api提供给您的所有操作

最后,我意识到,如果您是函数编程新手,那么我给出的两个函数可能都需要一些解释。树上递归的工作原理是注意到树中的每一个子树本身都可以被视为一棵树,因此我们可以通过函数调用本身并传递累积的路径列表和当前路径以将任何新路径附加到其中来节省大量代码行

编辑:我甚至会给你免费转换成dict的功能(注意相似性):


可以对生成器使用递归:

data = [['first', [0, 'list1'], [1, 'list2'], [2, 'list3']], ['second', ['second_subda', [0, 'tup1'], [1, 'tup2']], ['second_subdb', [0, 'tup3'], [1, 'tup4']]], ['third', ['third_subda', [0, 'a'], [1, 'b'], [2, 'c'], [3, ['d', [0, 'e'], [1, 'f'], [2, ['g', [0, 1], [1, 2], [2, 3]]]]]]]]
def get_paths(d, c = []):
  for a, *b in d:
    if len(b) == 1 and not isinstance(b[0], list):
      yield (b[0], c+[a])
    else:
      yield from get_paths(b, c+[a])

print(list(get_paths(data)))
输出:

[('list1', ['first', 0]), 
 ('list2', ['first', 1]), 
 ('list3', ['first', 2]), 
 ('tup1', ['second', 'second_subda', 0]), 
 ('tup2', ['second', 'second_subda', 1]), 
 ('tup3', ['second', 'second_subdb', 0]), 
 ('tup4', ['second', 'second_subdb', 1]), 
 ('a', ['third', 'third_subda', 0]), 
 ('b', ['third', 'third_subda', 1]), 
 ('c', ['third', 'third_subda', 2]), 
 ('e', ['third', 'third_subda', 3, 'd', 0]), 
 ('f', ['third', 'third_subda', 3, 'd', 1]), 
 (1, ['third', 'third_subda', 3, 'd', 2, 'g', 0]), 
 (2, ['third', 'third_subda', 3, 'd', 2, 'g', 1]), 
 (3, ['third', 'third_subda', 3, 'd', 2, 'g', 2])]

感谢您的回答-从列表列表开始的原因是,我的起点是一个任意复杂的python对象,由嵌套的dict、列表、元组和其他内容组成,我希望基于在叶子上标识的搜索词为这些内容构建提取索引。建立了这个索引后,我可以搜索每个元组的第一个元素,并返回路径来查找(和编辑)正在修改的对象的部分。感谢这个答案,它看起来可爱而优雅——我认为循环中的
a,*b
项是将列表分割成由[0]组成的分区的一种方法,对吗th和[1:]th列表内容?“我想我以前没有见过这种建筑。”托马斯金伯:你说得完全正确
a、*b
称为解包,其中
a
是迭代器生成的元素的第一个元素,
b
元素[1://code>。
[('list1', ['first', 0]), 
 ('list2', ['first', 1]), 
 ('list3', ['first', 2]), 
 ('tup1', ['second', 'second_subda', 0]), 
 ('tup2', ['second', 'second_subda', 1]), 
 ('tup3', ['second', 'second_subdb', 0]), 
 ('tup4', ['second', 'second_subdb', 1]), 
 ('a', ['third', 'third_subda', 0]), 
 ('b', ['third', 'third_subda', 1]), 
 ('c', ['third', 'third_subda', 2]), 
 ('e', ['third', 'third_subda', 3, 'd', 0]), 
 ('f', ['third', 'third_subda', 3, 'd', 1]), 
 (1, ['third', 'third_subda', 3, 'd', 2, 'g', 0]), 
 (2, ['third', 'third_subda', 3, 'd', 2, 'g', 1]), 
 (3, ['third', 'third_subda', 3, 'd', 2, 'g', 2])]