将日志文件解析为嵌套的开始和结束对的算法/Python
我正在努力帮助读取日志文件 我已经成功地转换了日志文件的每一行,因此我有一个Python dict,其中包含关于每一行的事实,这意味着我在内存中将该文件作为一个数组,如下所示:将日志文件解析为嵌套的开始和结束对的算法/Python,python,algorithm,data-structures,Python,Algorithm,Data Structures,我正在努力帮助读取日志文件 我已经成功地转换了日志文件的每一行,因此我有一个Python dict,其中包含关于每一行的事实,这意味着我在内存中将该文件作为一个数组,如下所示: [ {'keyword':'a','is_pair':True,'type':'open','details':'iwiv','linenumber':5}, {'keyword':'a','is_pair':True,'type':'open','details':'83fi','linenumber':200}, {
[
{'keyword':'a','is_pair':True,'type':'open','details':'iwiv','linenumber':5},
{'keyword':'a','is_pair':True,'type':'open','details':'83fi','linenumber':200},
{'keyword':'a','is_pair':True,'type':'open','details':'28c8','linenumber':360},
{'keyword':'a','is_pair':True,'type':'close','details':'28c8','linenumber':365},
{'keyword':'a','is_pair':True,'type':'open','details':'28c8','linenumber':370},
{'keyword':'a','is_pair':True,'type':'close','details':'28c8','linenumber':375},
{'keyword':'a','is_pair':True,'type':'open','details':'aowq','linenumber':400},
{'keyword':'b','is_pair':True,'type':'open','details':'pwiv','linenumber':520},
{'keyword':'b','is_pair':True,'type':'close','details':'pwiv','linenumber':528},
{'keyword':'d','is_pair':False,'details':'9393','linenumber':600},
{'keyword':'b','is_pair':True,'type':'open','details':'viao','linenumber':740},
{'keyword':'b','is_pair':True,'type':'close','details':'viao','linenumber':741},
{'keyword':'b','is_pair':True,'type':'open','details':'viao','linenumber':750},
{'keyword':'b','is_pair':True,'type':'close','details':'viao','linenumber':777},
{'keyword':'a','is_pair':True,'type':'close','details':'aowq','linenumber':822},
{'keyword':'a','is_pair':True,'type':'close','details':'83fi','linenumber':850},
{'keyword':'a','is_pair':True,'details':'iwiv','linenumber':990},
{'keyword':'c','is_pair':False,'details':'1212','linenumber':997},
]
我要做的是“配对”其“关键字”匹配**的“最近邻居”,如匹配括号嵌套,并将输出转储为某种标准化的嵌套文本语法,如XML或JSON
我已经知道哪些关键字是“独立设置”与“应该匹配”,在我的输入中标记为“is_pair”
我想为我组合的一对给出某种“线条范围”。。。对于那些单行程序,我不在乎它是否是一个“开始”和“结束”对,其中包含相同的数字、空结束、完全不同的标签(如我的示例中所示),等等
以下是一些输出的示例:
示例1
伊维夫
83fi
28c8
28c8
aowq
pwiv
9393
维奥
维奥
1212
示例2
示例3
[
{
'关键字':'a',
“开始”:5,
"完":990,,
‘详情’:‘iwiv’,
“内部”:[
{
'关键字':'a',
“开始”:200,
"完":850,,
‘详情’:‘83fi’,
“内部”:[
{'keyword':'a','details':'28c8'},
{'keyword':'a','details':'28c8'},
{
'关键字':'a',
“开始”:400,
"完":822,,
'详情':'aowq',
“内部”:[
{'keyword':'b','start':520','end':528,'details':'pwiv'},
{'keyword':'d','linenumber':600,'details':'9393'},
{'keyword':'b','start':740','end':741,'details':'viao'},
{'keyword':'b','start':750','end':777,'details':'viao'}
]
}
]
}
]
},
{'keyword':'c','linenumber':997,'details':'1212'}
]
在编写JSON或XML文件的细节方面,我不一定需要帮助
对于算法,尤其是pythonical算法,我不确定的是这项工作的“束匹配”方面。
如何将某个元素从“线性列表”转换为“嵌套”,其中每个元素的
open
都与下一个最近的close
匹配,而下一个最近的close
是同一个关键字,但更好的候选者尚未“声明”该关键字?我建议使用堆栈来解决这个问题。如果数据嵌套正确,将很容易解决
但是,我对嵌套不正确的数据进行了明确的错误检查。因为如果你得到了错误的结束标记,那么困难的问题就来了。我建议用堆栈解决这个问题。如果数据嵌套正确,将很容易解决
但是,我对嵌套不正确的数据进行了明确的错误检查。因为如果你得到了错误的结束标记,这就是难题所在。如果你的数据要按行号排序,最好的办法是使用堆栈。它还可以帮助您将其转换为所需的嵌套格式 通过重用您的数据,我们可以:
data = \
[
{'keyword':'a', 'is_pair':True, 'type':'open', 'details':'iwiv', 'linenumber':5},
{'keyword':'a', 'is_pair':True, 'type':'open', 'details':'83fi', 'linenumber':200},
{'keyword':'a', 'is_pair':True, 'type':'open', 'details':'28c8', 'linenumber':360},
{'keyword':'a', 'is_pair':True, 'type':'close', 'details':'28c8', 'linenumber':365},
{'keyword':'a', 'is_pair':True, 'type':'open', 'details':'28c8', 'linenumber':370},
{'keyword':'a', 'is_pair':True, 'type':'close', 'details':'28c8', 'linenumber':375},
{'keyword':'a', 'is_pair':True, 'type':'open', 'details':'aowq', 'linenumber':400},
{'keyword':'b', 'is_pair':True, 'type':'open', 'details':'pwiv', 'linenumber':520},
{'keyword':'b', 'is_pair':True, 'type':'close', 'details':'pwiv', 'linenumber':528},
{'keyword':'d', 'is_pair':False, 'details':'9393', 'linenumber':600},
{'keyword':'b', 'is_pair':True, 'type':'open', 'details':'viao', 'linenumber':740},
{'keyword':'b', 'is_pair':True, 'type':'close', 'details':'viao', 'linenumber':741},
{'keyword':'b', 'is_pair':True, 'type':'open', 'details':'viao', 'linenumber':750},
{'keyword':'b', 'is_pair':True, 'type':'close', 'details':'viao', 'linenumber':777},
{'keyword':'a', 'is_pair':True, 'type':'close', 'details':'aowq', 'linenumber':822},
{'keyword':'a', 'is_pair':True, 'type':'close', 'details':'83fi', 'linenumber':850},
{'keyword':'a', 'is_pair':True, 'type':'close', 'details':'iwiv', 'linenumber':990}, # added 'type':'close'
{'keyword':'c', 'is_pair':False, 'details':'1212', 'linenumber':997},
]
请注意,我在行号为990的数据上添加了一个结束符,否则就不会有匹配的对。如果没有闭合对,您将松开第一行(您可以在末尾检查堆栈是否为空,以捕获它)
现在,我们仍然必须以正确的顺序输出数据,因为关闭是以相反的顺序进行的,因此我们根据行号对结果进行排序,行号是元组的第一项。我们检查是否更改了嵌套级别,如果得到更多嵌套级别,则存储关键字。在这种情况下,我们减少了nestin
# The level of nesting, since we increase if we find an open
# the first open will get a depth of 0
depth = -1
# We store the complete answers and the stacked answers.
result, stack = [], []
for row in data:
# Check if the type is open, or if the data is unpaired
if row.get('type', None) == 'open' or not row['is_pair']:
# We store it on the stack and increase nesting level
stack.append(row)
depth += 1
# If there is no match, we close it directly.
# Or if the type is closing
if not row['is_pair'] or row.get('type', None) == 'close':
# We get the last item on the stack
matching_open = stack.pop(-1)
# We will sort on the linenumbers to make sure that everything will be in order
# we also store the dept for our layout (we are following example 2)
result.append((matching_open['linenumber'], depth,
f'{" " * 4 * depth}<{row["keyword"]} start="{matching_open["linenumber"]}" '
f'end="{row["linenumber"]}" details="{row["details"]}">'))
# Decrease nesting level
depth -= 1
if stack:
raise ValueError("There is still a value in the stack, matching is not possible!")
# For the closing signs we need to keep track of our depth and opening keyword
temp = []
old_depth = None
# We only need the depth and message, so we discard the linenumber
for _, depth, message in sorted(result, key= lambda x: x[0]):
# If the old depth was larger, we dropped a depth and we
# need to put in a closing sign </a>
if old_depth is not None and old_depth > depth:
for num in range(old_depth - depth):
close_open = temp.pop(-1)
print(f'{" "*4*(old_depth-num -1)}</{close_open}>')
# If we jump a depth we need to store the closing sign
if old_depth is not None and old_depth < depth:
temp.append(message[4*depth + 1])
# Update the depth and print the message, since we append everything
old_depth = depth
print(message)
<a start="5" end="990" details="iwiv">
<a start="200" end="850" details="83fi">
<a start="360" end="365" details="28c8">
<a start="370" end="375" details="28c8">
<a start="400" end="822" details="aowq">
<b start="520" end="528" details="pwiv">
<d start="600" end="600" details="9393">
<b start="740" end="741" details="viao">
<b start="750" end="777" details="viao">
</a>
</a>
</a>
<c start="997" end="997" details="1212">