Python文件解析：从文本文件构建树_Python_Algorithm_Recursion_Tree

Python文件解析：从文本文件构建树

python algorithm recursion tree

Python文件解析：从文本文件构建树,python,algorithm,recursion,tree,Python,Algorithm,Recursion,Tree,我有一个缩进的文本文件，将用于构建一棵树。每行表示一个节点，缩进表示深度以及当前节点的子节点例如，一个文件可能看起来像 ROOT Node1 Node2 Node3 Node4 Node5 Node6 根节点1 节点2 节点3 节点4 点头5 节点6 这表示根包含三个子节点：1、5和6，节点1有一个子节点：2，节点2有一个子节点：3，以此类推我提出了一个递归算法，并对它进行了编程，它可以工作，但它有点难看，尤其是对上面

我有一个缩进的文本文件，将用于构建一棵树。每行表示一个节点，缩进表示深度以及当前节点的子节点

例如，一个文件可能看起来像

ROOT Node1 Node2 Node3 Node4 Node5 Node6 根节点1 节点2 节点3 节点4 点头5 节点6 这表示根包含三个子节点：1、5和6，节点1有一个子节点：2，节点2有一个子节点：3，以此类推

我提出了一个递归算法，并对它进行了编程，它可以工作，但它有点难看，尤其是对上面的例子处理得非常粗糙（从节点4到节点5）

它使用“缩进计数”作为递归的基础，因此如果缩进的数量=当前深度+1，我会更深一层。但这意味着当我读一行缩进少的时候，我必须一次返回一个层次，每次检查深度

这是我的

def _recurse_tree(node, parent, depth): tabs = 0 while node: tabs = node.count("\t") if tabs == depth: print "%s: %s" %(parent.strip(), node.strip()) elif tabs == depth + 1: node = _recurse_tree(node, prev, depth+1) tabs = node.count("\t") #check if we have to surface some more if tabs == depth: print "%s: %s" %(parent.strip(), node.strip()) else: return node else: return node prev = node node = inFile.readline().rstrip() inFile = open("test.txt") root = inFile.readline().rstrip() node = inFile.readline().rstrip() _recurse_tree(node, root, 1) 定义递归树（节点、父节点、深度）：制表符=0 while节点： tabs=node.count（“\t”）如果制表符==深度：打印“%s:%s”%（父.strip（），节点.strip（）） elif制表符==深度+1：节点=_递归树（节点，上一个，深度+1） tabs=node.count（“\t”） #看看我们是否还需要浮出水面如果制表符==深度：打印“%s:%s”%（父.strip（），节点.strip（））其他：返回节点其他：返回节点 prev=节点 node=infle.readline（）.rstrip（） infle=open（“test.txt”） root=infle.readline（）.rstrip（） node=infle.readline（）.rstrip（） _递归树（节点，根，1）

现在我只是打印出节点，以验证父节点对于每一行都是正确的，但是也许有一种更干净的方法可以做到这一点？尤其是elif块中的情况，当我从每次递归调用返回时。

我根本不会将递归用于类似的东西（好的，如果我用类似Scheme的语言编写，也许我会这样做，但这里是Python）。递归非常适合在树形数据上进行迭代，在这种情况下，与普通循环相比，它将大大简化您的设计

然而，这里的情况并非如此。您的数据确实代表一棵树，但它是按顺序格式化的，即它是一个简单的行序列。这样的数据最容易通过一个简单的循环来处理，不过如果您愿意的话，可以通过将其分为三个不同的层来使设计更通用：顺序读取器（它将把选项卡解析为深度级别的规范）、树插入器（通过跟踪插入到树中的最后一个节点，将节点插入到特定深度级别的树中）以及树本身：

import re

# *** Tree representation ***
class Node(object):
    def __init__(self, title):
        self.title = title
        self.parent = None
        self.children = []

    def add(self, child):
        self.children.append(child)
        child.parent = self

# *** Node insertion logic ***
class Inserter(object):
    def __init__(self, node, depth = 0):
        self.node = node
        self.depth = depth

    def __call__(self, title, depth):
        newNode = Node(title)
        if (depth > self.depth):
            self.node.add(newNode)
            self.depth = depth
        elif (depth == self.depth):
            self.node.parent.add(newNode)
        else:
            parent = self.node.parent
            for i in xrange(0, self.depth - depth):
                parent = parent.parent
            parent.add(newNode)
            self.depth = depth

        self.node = newNode

# *** File iteration logic ***
with open(r'tree.txt', 'r') as f:    
    tree = Node(f.readline().rstrip('\n'))
    inserter = Inserter(tree)

    for line in f:
        line = line.rstrip('\n')
        # note there's a bug with your original tab parsing code:
        # it would count all tabs in the string, not just the ones
        # at the beginning
        tabs = re.match('\t*', line).group(0).count('\t')
        title = line[tabs:]
        inserter(title, tabs)

当我必须在粘贴到这里之前测试代码时，我编写了一个非常简单的函数来漂亮地打印我读到内存中的树。对于这个函数，最自然的事情当然是使用递归，因为现在树确实被表示为树数据：

def print_tree(node, depth = 0):
    print '%s%s' % ('  ' * depth, node.title)
    for child in node.children:
        print_tree(child, depth + 1)

print_tree(tree)

最大的问题是“前瞻性”，我认为这导致了问题的丑陋。它可以稍微缩短：

def _recurse_tree(parent, depth, source):
    last_line = source.readline().rstrip()
    while last_line:
        tabs = last_line.count('\t')
        if tabs < depth:
            break
        node = last_line.strip()
        if tabs >= depth:
            if parent is not None:
                print "%s: %s" %(parent, node)
            last_line = _recurse_tree(node, tabs+1, source)
    return last_line

inFile = open("test.txt")
_recurse_tree(None, 0, inFile)

def\u递归树（父级、深度、源）：
最后一行=source.readline（）.rstrip（）
而最后一行：
制表符=最后一行。计数（'\t'）
如果制表符<深度：
打破
node=最后一行。strip（）
如果制表符>=深度：
如果父项不是无：
打印“%s:%s”%（父级，节点）
最后一行=递归树（节点，选项卡+1，源）
返回最后一行
infle=open（“test.txt”）
_递归树（无，0，填充）

因为我们讨论的是递归，所以我尽量避免使用任何全局变量（

source

和

last\u line

）。让它们成为某个解析器对象上的成员会更像python。如果你不坚持递归，这也行得通：

from itertools import takewhile

is_tab = '\t'.__eq__

def build_tree(lines):
    lines = iter(lines)
    stack = []
    for line in lines:
        indent = len(list(takewhile(is_tab, line)))
        stack[indent:] = [line.lstrip()]
        print stack

source = '''ROOT
\tNode1
\t\tNode2
\t\t\tNode3
\t\t\t\tNode4
\tNode5
\tNode6'''

build_tree(source.split('\n'))

结果:

['ROOT']
['ROOT', 'Node1']
['ROOT', 'Node1', 'Node2']
['ROOT', 'Node1', 'Node2', 'Node3']
['ROOT', 'Node1', 'Node2', 'Node3', 'Node4']
['ROOT', 'Node5']
['ROOT', 'Node6']

您必须编写一系列代码来正确解析和验证。您可以使用xml吗？它的基本结构是一棵树。不幸的是，不可以，因为这更像是一个递归练习，而不是任何东西。我认为这类问题很常见。这可能是一个家庭作业问题吗？如果可以，添加家庭作业是一种礼仪tag.Nope，个人兴趣。已经有一段时间没有做递归了。如果是这样的话，这真的不是Python特有的。更多的是一个通用的算法在沉思。@martineau：你是对的，我的意思是用

source

替换函数中的

infle

，现在已经修复了。在我看来，

最后一行参数总是作为传入ode>None
——因此它可能只是一个局部变量，初始值为source.readline（）.rstrip（）
，设置在while
循环之前（检查是否None
已删除）@martineau：又对了，也相应地进行了编辑。在我写这篇文章的时候，我做了一些修改，不确定每个递归/返回是否会对应到下一行输入。因为我提到了这是一个“缩短的”版本，我想我最好挤出所有的空气，嗯？很棒的风格。特别是是_选项卡
定义。没有takewhile
（更快更干净）：对于行中的行：body=line.lstrip（'\t'）；level=len（line）-len（body）；stack[level:=（body，）
。