在python中将字符串转换为树结构
我在python中有一个字符串,其形式如下:在python中将字符串转换为树结构,python,Python,我在python中有一个字符串,其形式如下: line a line b line ba line bb line bba line bc line c line ca line caa line d 你会明白的。实际上,它的形式与python代码本身非常相似,因为有一行,在这行下面,缩进表示块的一部分,以较小缩进的最近一行开头 我需要做的是将这段代码解析成树结构,这样每个根级别的行都是字典的键,它的值是表示所有子行的字典。因此,上述情况将是: { 'line
line a
line b
line ba
line bb
line bba
line bc
line c
line ca
line caa
line d
你会明白的。实际上,它的形式与python代码本身非常相似,因为有一行,在这行下面,缩进表示块的一部分,以较小缩进的最近一行开头
我需要做的是将这段代码解析成树结构,这样每个根级别的行都是字典的键,它的值是表示所有子行的字典。因此,上述情况将是:
{
'line a' => {},
'line b' => {
'line ba' => {},
'line bb' => {
'line bba' => {}
},
'line bc' => {}
},
'line c' => {
'line ca' => {
'line caa' => {}
},
},
'line d' => {}
}
以下是我得到的:
def parse_message_to_tree(message):
buf = StringIO(message)
return parse_message_to_tree_helper(buf, 0)
def parse_message_to_tree_helper(buf, prev):
ret = {}
for line in buf:
line = line.rstrip()
index = len(line) - len(line.lstrip())
print (line + " => " + str(index))
if index > prev:
ret[line.strip()] = parse_message_to_tree_helper(buf, index)
else:
ret[line.strip()] = {}
return ret
打印显示左剥离的行和0的索引。我不认为lstrip()
是一个变异因子,但无论如何索引都应该是准确的
任何建议都是有益的
编辑:我不确定以前出了什么问题,但我再试了一次,它离工作更近了,但仍然不太正确。以下是我现在拥有的:
{'line a': {},
'line b': {},
'line ba': {'line bb': {},
'line bba': {'line bc': {},
'line c': {},
'line ca': {},
'line caa': {},
'line d': {}}}}
lstrip()
string.lstrip(s[,chars])
返回删除前导字符的字符串副本。如果省略字符或无字符,则删除空白字符。如果给予
而不是无,字符必须是字符串;字符串中的字符
将从字符串的开头剥离此方法
拜访
您的代码似乎与我的机器上的示例文本一起工作。就像前面提到的那样。str.lstrip()
不是一个变体,索引在我的系统中也很精确
但问题是,当您意识到行的索引增加时,line
实际上是指向增加的索引行,例如,在第一种情况下,我们注意到行的索引在line ba
处增加,因此line
指向line ba
,然后在条件下,是吗-
ret[line.strip()] = parse_message_to_tree_helper(buf, index)
这是错误的,因为您将把parse_message_to_tree_helper()
返回的内容设置为line ba
,而不是它的实际父对象
此外,一旦在函数内部递归,除非文件已被完全读取,否则不会出现,但字典中某一行的存储级别取决于缩进减少时递归产生的行数
我不确定是否有任何内置库可以帮助您做到这一点,但我能够想出一个代码(基于您的代码)-
def将消息解析到树(消息):
buf=StringIO(消息)
将解析消息返回给树帮助程序(buf,0,无)[0]
def parse_message_to_tree_helper(buf、prev、prevline):
ret={}
索引=-1
对于buf中的行:
line=line.rstrip()
index=len(line)-len(line.lstrip())
打印(行+“=>”+str(索引))
如果索引>上一个:
ret[prevline.strip()],prevline,index=parse_message_to_tree_helper(buf,index,line)
如果指数
示例/演示-
>>> print(s)
line a
line b
line ba
line bb
line bba
line bc
line c
line ca
line caa
>>> def parse_message_to_tree(message):
... buf = StringIO(message)
... return parse_message_to_tree_helper(buf, 0, None)[0]
...
>>> def parse_message_to_tree_helper(buf, prev, prevline):
... ret = {}
... index = -1
... for line in buf:
... line = line.rstrip()
... index = len(line) - len(line.lstrip())
... print (line + " => " + str(index))
... if index > prev:
... ret[prevline.strip()],prevline,index = parse_message_to_tree_helper(buf, index, line)
... if index < prev:
... return ret,prevline,index
... continue
... elif not prevline:
... ret[line.strip()] = {}
... else:
... ret[prevline.strip()] = {}
... if index < prev:
... return ret,line,index
... prevline = line
... if index == -1:
... ret[prevline.strip()] = {}
... return ret,None,index
... if prev == index:
... ret[prevline.strip()] = {}
... return ret,None,0
...
>>> pprint.pprint(parse_message_to_tree(s))
line a => 0
line b => 0
line ba => 2
line bb => 2
line bba => 4
line bc => 2
line c => 0
line ca => 2
line caa => 4
{'line a': {},
'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}},
'line c': {'line ca': {'line caa': {}}}}
>>> s = """line a
... line b
... line ba
... line bb
... line bba
... line bc
... line c
... line ca
... line caa
... line d"""
>>> pprint.pprint(parse_message_to_tree(s))
line a => 0
line b => 0
line ba => 2
line bb => 2
line bba => 4
line bc => 2
line c => 0
line ca => 2
line caa => 4
line d => 0
{'line a': {},
'line b': {'line ba': {}, 'line bb': {'line bba': {}}, 'line bc': {}},
'line c': {'line ca': {'line caa': {}}},
'line d': {}}
>打印
a线
b行
线路ba
行bb
bba线
bc线
c行
线ca
线路caa
>>>def解析_消息到_树(消息):
... buf=StringIO(消息)
... 将解析消息返回给树帮助程序(buf,0,无)[0]
...
>>>def parse_message_to_tree_helper(buf、prev、prevline):
... ret={}
... 索引=-1
... 对于buf中的行:
... line=line.rstrip()
... index=len(line)-len(line.lstrip())
... 打印(行+“=>”+str(索引))
... 如果索引>上一个:
... ret[prevline.strip()],prevline,index=parse_message_to_tree_helper(buf,index,line)
... 如果指数>>pprint.pprint(将消息解析到树)
a行=>0
第b行=>0
行ba=>2
行bb=>2
行bba=>4
行bc=>2
第c行=>0
行ca=>2
行caa=>4
{'line a':{},
'行b':{'行ba':{},'行bb':{'行bba':{},'行bc':{},
'行c':{'行ca':{'行caa':{}}
>>>s=“”a行
…b行
…线路ba
…行bb
…bba线
…bc线
…c线
…线路ca
…线路caa
…第d行“”
>>>pprint.pprint(将消息解析到树)
a行=>0
第b行=>0
行ba=>2
行bb=>2
行bba=>4
行bc=>2
第c行=>0
行ca=>2
行caa=>4
第d行=>0
{'line a':{},
'行b':{'行ba':{},'行bb':{'行bba':{},'行bc':{},
'行c':{'行ca':{'行caa':{}},
'行d':{}
您需要测试代码是否有任何错误或遗漏的情况。另一个答案,使用堆栈而不是递归。到这个版本需要几次迭代,它似乎可以处理几个可能的输入场景,但不能保证完全没有bug!这确实是一个棘手的问题。希望我的评论能说明一个正确的思路。谢谢分享这个问题
text = '''line a
line b
line ba
line bb
line bba
line bc
line c
line ca
line caa
line d'''
root_tree = {}
stack = []
prev_indent, prev_tree = -1, root_tree
for line in text.splitlines():
# compute current line's indent and strip the line
origlen = len(line)
line = line.lstrip()
indent = origlen - len(line)
print indent, line
# no matter what, every line has its own tree, so let's create it.
tree = {}
# where to attach this new tree is dependent on indent, prev_indent
# assume: stack[-1] was the right attach point for the previous line
# then: let's adjust the stack to make that true for the current line
if indent < prev_indent:
while stack[-1][0] >= indent:
stack.pop()
elif indent > prev_indent:
stack.append((prev_indent, prev_tree))
# at this point: stack[-1] is the right attach point for the current line
parent_indent, parent_tree = stack[-1]
assert parent_indent < indent
# attach the current tree
parent_tree[line] = tree
# update state
prev_indent, prev_tree = indent, tree
print len(stack)
print stack
print root_tree
text=''行a
b行
线路ba
行bb
bba线
bc线
c行
线ca
线路caa
d行''
根_树={}
堆栈=[]
上一缩进,上一棵树=-1,根树
对于文本中的行。拆分行():
#计算当前行的缩进
text = '''line a
line b
line ba
line bb
line bba
line bc
line c
line ca
line caa
line d'''
root_tree = {}
stack = []
prev_indent, prev_tree = -1, root_tree
for line in text.splitlines():
# compute current line's indent and strip the line
origlen = len(line)
line = line.lstrip()
indent = origlen - len(line)
print indent, line
# no matter what, every line has its own tree, so let's create it.
tree = {}
# where to attach this new tree is dependent on indent, prev_indent
# assume: stack[-1] was the right attach point for the previous line
# then: let's adjust the stack to make that true for the current line
if indent < prev_indent:
while stack[-1][0] >= indent:
stack.pop()
elif indent > prev_indent:
stack.append((prev_indent, prev_tree))
# at this point: stack[-1] is the right attach point for the current line
parent_indent, parent_tree = stack[-1]
assert parent_indent < indent
# attach the current tree
parent_tree[line] = tree
# update state
prev_indent, prev_tree = indent, tree
print len(stack)
print stack
print root_tree