将日志文件解析为嵌套的开始和结束对的算法/Python_Python_Algorithm_Data Structures

将日志文件解析为嵌套的开始和结束对的算法/Python

python algorithm data-structures

将日志文件解析为嵌套的开始和结束对的算法/Python,python,algorithm,data-structures,Python,Algorithm,Data Structures,我正在努力帮助读取日志文件我已经成功地转换了日志文件的每一行，因此我有一个Python dict，其中包含关于每一行的事实，这意味着我在内存中将该文件作为一个数组，如下所示： [ {'keyword'：'a'，'is_pair'：True，'type'：'open'，'details'：'iwiv'，'linenumber'：5}， {'keyword'：'a'，'is_pair'：True，'type'：'open'，'details'：'83fi'，'linenumber'：200}， {

我正在努力帮助读取日志文件

我已经成功地转换了日志文件的每一行，因此我有一个Python dict，其中包含关于每一行的事实，这意味着我在内存中将该文件作为一个数组，如下所示：

[
{'keyword'：'a'，'is_pair'：True，'type'：'open'，'details'：'iwiv'，'linenumber'：5}，
{'keyword'：'a'，'is_pair'：True，'type'：'open'，'details'：'83fi'，'linenumber'：200}，
{'keyword'：'a'，'is_pair'：True，'type'：'open'，'details'：'28c8'，'linenumber'：360}，
{'keyword'：'a'，'is_pair'：True，'type'：'close'，'details'：'28c8'，'linenumber'：365}，
{'keyword'：'a'，'is_pair'：True，'type'：'open'，'details'：'28c8'，'linenumber'：370}，
{'keyword'：'a'，'is_pair'：True，'type'：'close'，'details'：'28c8'，'linenumber'：375}，
{'keyword'：'a'，'is_pair'：True，'type'：'open'，'details'：'aowq'，'linenumber'：400}，
{'keyword'：'b'，'is_pair'：True，'type'：'open'，'details'：'pwiv'，'linenumber'：520}，
{'keyword'：'b'，'is_pair'：True，'type'：'close'，'details'：'pwiv'，'linenumber'：528}，
{'keyword'：'d'，'is_pair'：False，'details'：'9393'，'linenumber'：600}，
{'keyword'：'b'，'is_pair'：True，'type'：'open'，'details'：'viao'，'linenumber'：740}，
{'keyword'：'b'，'is_pair'：True，'type'：'close'，'details'：'viao'，'linenumber'：741}，
{'keyword'：'b'，'is_pair'：True，'type'：'open'，'details'：'viao'，'linenumber'：750}，
{'keyword'：'b'，'is_pair'：True，'type'：'close'，'details'：'viao'，'linenumber'：777}，
{'keyword'：'a'，'is_pair'：True，'type'：'close'，'details'：'aowq'，'linenumber'：822}，
{'keyword'：'a'，'is_pair'：True，'type'：'close'，'details'：'83fi'，'linenumber'：850}，
{'keyword'：'a'，'is_pair'：True，'details'：'iwiv'，'linenumber'：990}，
{'keyword'：'c'，'is_pair'：False，'details'：'1212'，'linenumber'：997}，
]

我要做的是“配对”其“关键字”匹配**的“最近邻居”，如匹配括号嵌套，并将输出转储为某种标准化的嵌套文本语法，如XML或JSON

我已经知道哪些关键字是“独立设置”与“应该匹配”，在我的输入中标记为“is_pair”

我想为我组合的一对给出某种“线条范围”。。。对于那些单行程序，我不在乎它是否是一个“开始”和“结束”对，其中包含相同的数字、空结束、完全不同的标签（如我的示例中所示），等等

以下是一些输出的示例：

示例1


伊维夫
83fi
28c8
28c8
aowq
pwiv
9393
维奥
维奥
1212

示例2

示例3

[
{
'关键字'：'a'，
“开始”：5，
"完":990,，
‘详情’：‘iwiv’，
“内部”：[
{
'关键字'：'a'，
“开始”：200，
"完":850,，
‘详情’：‘83fi’，
“内部”：[
{'keyword'：'a'，'details'：'28c8'}，
{'keyword'：'a'，'details'：'28c8'}，
{
'关键字'：'a'，
“开始”：400，
"完":822,，
'详情'：'aowq'，
“内部”：[
{'keyword'：'b'，'start'：520'，'end'：528，'details'：'pwiv'}，
{'keyword'：'d'，'linenumber'：600，'details'：'9393'}，
{'keyword'：'b'，'start'：740'，'end'：741，'details'：'viao'}，
{'keyword'：'b'，'start'：750'，'end'：777，'details'：'viao'}
]
}
]
}
]
},
{'keyword'：'c'，'linenumber'：997，'details'：'1212'}
]

在编写JSON或XML文件的细节方面，我不一定需要帮助

对于算法，尤其是pythonical算法，我不确定的是这项工作的“束匹配”方面。

如何将某个元素从“线性列表”转换为“嵌套”，其中每个元素的

open

都与下一个最近的

close

匹配，而下一个最近的

close

是同一个关键字，但更好的候选者尚未“声明”该关键字？

我建议使用堆栈来解决这个问题。如果数据嵌套正确，将很容易解决

但是，我对嵌套不正确的数据进行了明确的错误检查。因为如果你得到了错误的结束标记，那么困难的问题就来了。

我建议用堆栈解决这个问题。如果数据嵌套正确，将很容易解决

但是，我对嵌套不正确的数据进行了明确的错误检查。因为如果你得到了错误的结束标记，这就是难题所在。

如果你的数据要按行号排序，最好的办法是使用堆栈。它还可以帮助您将其转换为所需的嵌套格式

通过重用您的数据，我们可以：

data = \
[
    {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'iwiv', 'linenumber':5},
    {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'83fi', 'linenumber':200},
    {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'28c8', 'linenumber':360},
    {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'28c8', 'linenumber':365},
    {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'28c8', 'linenumber':370},
    {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'28c8', 'linenumber':375},
    {'keyword':'a', 'is_pair':True, 'type':'open', 'details':'aowq', 'linenumber':400},
    {'keyword':'b', 'is_pair':True, 'type':'open', 'details':'pwiv', 'linenumber':520},
    {'keyword':'b', 'is_pair':True, 'type':'close', 'details':'pwiv', 'linenumber':528},
    {'keyword':'d', 'is_pair':False, 'details':'9393', 'linenumber':600},
    {'keyword':'b', 'is_pair':True, 'type':'open', 'details':'viao', 'linenumber':740},
    {'keyword':'b', 'is_pair':True, 'type':'close', 'details':'viao', 'linenumber':741},
    {'keyword':'b', 'is_pair':True, 'type':'open', 'details':'viao', 'linenumber':750},
    {'keyword':'b', 'is_pair':True, 'type':'close', 'details':'viao', 'linenumber':777},
    {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'aowq', 'linenumber':822},
    {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'83fi', 'linenumber':850},
    {'keyword':'a', 'is_pair':True, 'type':'close', 'details':'iwiv', 'linenumber':990}, # added 'type':'close'
    {'keyword':'c', 'is_pair':False, 'details':'1212', 'linenumber':997},
]

请注意，我在行号为990的数据上添加了一个结束符，否则就不会有匹配的对。如果没有闭合对，您将松开第一行（您可以在末尾检查堆栈是否为空，以捕获它）

现在，我们仍然必须以正确的顺序输出数据，因为关闭是以相反的顺序进行的，因此我们根据行号对结果进行排序，行号是元组的第一项。我们检查是否更改了嵌套级别，如果得到更多嵌套级别，则存储关键字。在这种情况下，我们减少了nestin

# The level of nesting, since we increase if we find an open
# the first open will get a depth of 0
depth = -1

# We store the complete answers and the stacked answers.
result, stack = [], []


for row in data:
    # Check if the type is open, or if the data is unpaired
    if row.get('type', None) == 'open' or not row['is_pair']:

        # We store it on the stack and increase nesting level
        stack.append(row)
        depth += 1

    # If there is no match, we close it directly.
    # Or if the type is closing
    if not row['is_pair'] or row.get('type', None) == 'close':

        # We get the last item on the stack
        matching_open = stack.pop(-1)

        # We will sort on the linenumbers to make sure that everything will be in order
        # we also store the dept for our layout (we are following example 2)
        result.append((matching_open['linenumber'], depth,
                       f'{" " * 4 * depth}<{row["keyword"]} start="{matching_open["linenumber"]}" '
                       f'end="{row["linenumber"]}" details="{row["details"]}">'))

        # Decrease nesting level
        depth -= 1

if stack:
    raise ValueError("There is still a value in the stack, matching is not possible!")

# For the closing signs we need to keep track of our depth and opening keyword
temp = []
old_depth = None

# We only need the depth and message, so we discard the linenumber
for _, depth, message in sorted(result, key= lambda x: x[0]):

    # If the old depth was larger, we dropped a depth and we
    # need to put in a closing sign </a>
    if old_depth is not None and old_depth > depth:
        for num in range(old_depth - depth):
            close_open = temp.pop(-1)
            print(f'{" "*4*(old_depth-num -1)}</{close_open}>')

    # If we jump a depth we need to store the closing sign
    if old_depth is not None and old_depth < depth:
        temp.append(message[4*depth + 1])

    # Update the depth and print the message, since we append everything
    old_depth = depth
    print(message)

<a start="5" end="990" details="iwiv">
    <a start="200" end="850" details="83fi">
        <a start="360" end="365" details="28c8">
        <a start="370" end="375" details="28c8">
        <a start="400" end="822" details="aowq">
            <b start="520" end="528" details="pwiv">
            <d start="600" end="600" details="9393">
            <b start="740" end="741" details="viao">
            <b start="750" end="777" details="viao">
        </a> 
    </a>
</a>
<c start="997" end="997" details="1212">