在python中将字符串转换为树结构_Python

在python中将字符串转换为树结构

python

在python中将字符串转换为树结构,python,Python,我在python中有一个字符串，其形式如下： line a line b line ba line bb line bba line bc line c line ca line caa line d 你会明白的。实际上，它的形式与python代码本身非常相似，因为有一行，在这行下面，缩进表示块的一部分，以较小缩进的最近一行开头我需要做的是将这段代码解析成树结构，这样每个根级别的行都是字典的键，它的值是表示所有子行的字典。因此，上述情况将是： { 'line

我在python中有一个字符串，其形式如下：

line a
line b
  line ba
  line bb
    line bba
  line bc
line c
  line ca
    line caa
line d

你会明白的。实际上，它的形式与python代码本身非常相似，因为有一行，在这行下面，缩进表示块的一部分，以较小缩进的最近一行开头

我需要做的是将这段代码解析成树结构，这样每个根级别的行都是字典的键，它的值是表示所有子行的字典。因此，上述情况将是：

{
'line a' => {},
'line b' => {
  'line ba' => {},
  'line bb' => {
    'line bba' => {}
    },
  'line bc' => {}
  },
'line c' => {
  'line ca' => {
    'line caa' => {}
    },
  },
'line d' => {}
}

以下是我得到的：

def parse_message_to_tree(message):
    buf = StringIO(message)
    return parse_message_to_tree_helper(buf, 0)

def parse_message_to_tree_helper(buf, prev):
    ret = {}
    for line in buf:
        line = line.rstrip()
        index = len(line) - len(line.lstrip())
        print (line + " => " + str(index))
        if index > prev:
            ret[line.strip()] = parse_message_to_tree_helper(buf, index)
        else:
            ret[line.strip()] = {}

    return ret

打印显示左剥离的行和0的索引。我不认为

lstrip（）

是一个变异因子，但无论如何索引都应该是准确的

任何建议都是有益的

编辑：我不确定以前出了什么问题，但我再试了一次，它离工作更近了，但仍然不太正确。以下是我现在拥有的：

{'line a': {},
 'line b': {},
 'line ba': {'line bb': {},
             'line bba': {'line bc': {},
                          'line c': {},
                          'line ca': {},
                          'line caa': {},
                          'line d': {}}}}

lstrip（）
string.lstrip（s[，chars]）
返回删除前导字符的字符串副本。如果省略字符或无字符，则删除空白字符。如果给予
而不是无，字符必须是字符串；字符串中的字符
将从字符串的开头剥离此方法
拜访
您的代码似乎与我的机器上的示例文本一起工作。
就像前面提到的那样。str.lstrip（）
不是一个变体，索引在我的系统中也很精确
但问题是，当您意识到行的索引增加时，line
实际上是指向增加的索引行，例如，在第一种情况下，我们注意到行的索引在line ba
处增加，因此line
指向line ba
，然后在条件下，是吗-
ret[line.strip()] = parse_message_to_tree_helper(buf, index)

这是错误的，因为您将把parse_message_to_tree_helper（）
返回的内容设置为line ba
，而不是它的实际父对象
此外，一旦在函数内部递归，除非文件已被完全读取，否则不会出现，但字典中某一行的存储级别取决于缩进减少时递归产生的行数
我不确定是否有任何内置库可以帮助您做到这一点，但我能够想出一个代码（基于您的代码）-
def将消息解析到树（消息）：
buf=StringIO（消息）
将解析消息返回给树帮助程序（buf，0，无）[0]
def parse_message_to_tree_helper（buf、prev、prevline）：
ret={}
索引=-1
对于buf中的行：
line=line.rstrip（）
index=len（line）-len（line.lstrip（））
打印（行+“=>”+str（索引））
如果索引>上一个：
ret[prevline.strip（）]，prevline，index=parse_message_to_tree_helper（buf，index，line）
如果指数


示例/演示-
>>> print(s)
line a
line b
  line ba
  line bb
    line bba
  line bc
line c
  line ca
    line caa
>>> def parse_message_to_tree(message):
...     buf = StringIO(message)
...     return parse_message_to_tree_helper(buf, 0, None)[0]
...
>>> def parse_message_to_tree_helper(buf, prev, prevline):
...     ret = {}
...     index = -1
...     for line in buf:
...         line = line.rstrip()
...         index = len(line) - len(line.lstrip())
...         print (line + " => " + str(index))
...         if index > prev:
...             ret[prevline.strip()],prevline,index = parse_message_to_tree_helper(buf, index, line)
...             if index < prev:
...                 return ret,prevline,index
...             continue
...         elif not prevline:
...             ret[line.strip()] = {}
...         else:
...             ret[prevline.strip()] = {}
...         if index < prev:
...             return ret,line,index
...         prevline = line
...     if index == -1:
...         ret[prevline.strip()] = {}
...         return ret,None,index
...     if prev == index:
...         ret[prevline.strip()] = {}
...     return ret,None,0
...
>>> pprint.pprint(parse_message_to_tree(s))
line a => 0
line b => 0
  line ba => 2
  line bb => 2
    line bba => 4
  line bc => 2
line c => 0
  line ca => 2
    line caa => 4
{'line a': {},
 'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}},
 'line c': {'line ca': {'line caa': {}}}}
>>> s = """line a
... line b
...   line ba
...   line bb
...     line bba
...   line bc
... line c
...   line ca
...     line caa
... line d"""
>>> pprint.pprint(parse_message_to_tree(s))
line a => 0
line b => 0
  line ba => 2
  line bb => 2
    line bba => 4
  line bc => 2
line c => 0
  line ca => 2
    line caa => 4
line d => 0
{'line a': {},
 'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}},
 'line c': {'line ca': {'line caa': {}}},
 'line d': {}}

>打印
a线
b行
线路ba
行bb
bba线
bc线
c行
线ca
线路caa
>>>def解析_消息到_树（消息）：
...     buf=StringIO（消息）
...     将解析消息返回给树帮助程序（buf，0，无）[0]
...
>>>def parse_message_to_tree_helper（buf、prev、prevline）：
...     ret={}
...     索引=-1
...     对于buf中的行：
...         line=line.rstrip（）
...         index=len（line）-len（line.lstrip（））
...         打印（行+“=>”+str（索引））
...         如果索引>上一个：
...             ret[prevline.strip（）]，prevline，index=parse_message_to_tree_helper（buf，index，line）
...             如果指数>>pprint.pprint（将消息解析到树）
a行=>0
第b行=>0
行ba=>2
行bb=>2
行bba=>4
行bc=>2
第c行=>0
行ca=>2
行caa=>4
{'line a'：{}，
'行b'：{'行ba'：{}，'行bb'：{'行bba'：{}，'行bc'：{}，
'行c'：{'行ca'：{'行caa'：{}}
>>>s=“”a行
…b行
…线路ba
…行bb
…bba线
…bc线
…c线
…线路ca
…线路caa
…第d行“”
>>>pprint.pprint（将消息解析到树）
a行=>0
第b行=>0
行ba=>2
行bb=>2
行bba=>4
行bc=>2
第c行=>0
行ca=>2
行caa=>4
第d行=>0
{'line a'：{}，
'行b'：{'行ba'：{}，'行bb'：{'行bba'：{}，'行bc'：{}，
'行c'：{'行ca'：{'行caa'：{}}，
'行d'：{}

您需要测试代码是否有任何错误或遗漏的情况。
另一个答案，使用堆栈而不是递归。到这个版本需要几次迭代，它似乎可以处理几个可能的输入场景，但不能保证完全没有bug！这确实是一个棘手的问题。希望我的评论能说明一个正确的思路。谢谢分享这个问题
text = '''line a
line b
  line ba
  line bb
    line bba
  line bc
line c
  line ca
    line caa
line d'''

root_tree = {}
stack = []
prev_indent, prev_tree = -1, root_tree

for line in text.splitlines():

    # compute current line's indent and strip the line
    origlen = len(line)
    line = line.lstrip()
    indent = origlen - len(line)
    print indent, line

    # no matter what, every line has its own tree, so let's create it.
    tree = {}  

    # where to attach this new tree is dependent on indent, prev_indent
    # assume: stack[-1] was the right attach point for the previous line
    # then: let's adjust the stack to make that true for the current line

    if indent < prev_indent:
        while stack[-1][0] >= indent:
            stack.pop()
    elif indent > prev_indent:
        stack.append((prev_indent, prev_tree))

    # at this point: stack[-1] is the right attach point for the current line
    parent_indent, parent_tree = stack[-1]
    assert parent_indent < indent

    # attach the current tree
    parent_tree[line] = tree

    # update state
    prev_indent, prev_tree = indent, tree

print len(stack)
print stack
print root_tree

text=''行a
b行
线路ba
行bb
bba线
bc线
c行
线ca
线路caa
d行''
根_树={}
堆栈=[]
上一缩进，上一棵树=-1，根树
对于文本中的行。拆分行（）：
#计算当前行的缩进
text = '''line a
line b
  line ba
  line bb
    line bba
  line bc
line c
  line ca
    line caa
line d'''

root_tree = {}
stack = []
prev_indent, prev_tree = -1, root_tree

for line in text.splitlines():

    # compute current line's indent and strip the line
    origlen = len(line)
    line = line.lstrip()
    indent = origlen - len(line)
    print indent, line

    # no matter what, every line has its own tree, so let's create it.
    tree = {}  

    # where to attach this new tree is dependent on indent, prev_indent
    # assume: stack[-1] was the right attach point for the previous line
    # then: let's adjust the stack to make that true for the current line

    if indent < prev_indent:
        while stack[-1][0] >= indent:
            stack.pop()
    elif indent > prev_indent:
        stack.append((prev_indent, prev_tree))

    # at this point: stack[-1] is the right attach point for the current line
    parent_indent, parent_tree = stack[-1]
    assert parent_indent < indent

    # attach the current tree
    parent_tree[line] = tree

    # update state
    prev_indent, prev_tree = indent, tree

print len(stack)
print stack
print root_tree