Python 跨多行匹配日志条目_Python_Regex

Python 跨多行匹配日志条目

python regex

Python 跨多行匹配日志条目,python,regex,Python,Regex,我想知道如何匹配可能跨越文本文件中多行的日志条目，特别是使用python [yyyy/mm/dd time] Entry [yyyy/mm/dd time] this is a multiline entry [yyyy/mm/dd time] Another entry 因此，在这个场景中，我的正则表达式应该有3个匹配项充其量，我有一个匹配每一行的正则表达式，但当涉及到跨多行拆分的日志条目时，这个正则表达式就不够了： regex = re.compile(\[\d{4}\/\d{2}\/\

我想知道如何匹配可能跨越文本文件中多行的日志条目，特别是使用python

[yyyy/mm/dd time] Entry
[yyyy/mm/dd time] this is
a multiline
entry
[yyyy/mm/dd time] Another entry

因此，在这个场景中，我的正则表达式应该有3个匹配项

充其量，我有一个匹配每一行的正则表达式，但当涉及到跨多行拆分的日志条目时，这个正则表达式就不够了：

regex = re.compile(\[\d{4}\/\d{2}\/\d{2}.{31}].*')

您可以使用

re.S

和

re.MULTILINE

使点与换行符匹配，并使用

匹配行首

然后，匹配两个时间戳之间或一个时间戳和字符串结尾之间的所有内容

regex = re.compile("^\[\d{4}\/\d{2}\/\d{2}[^\]]*](.*?)(?=^\[\d{4}\/\d{2}\/\d{2}[^\]]*])|^\[\d{4}\/\d{2}\/\d{2}[^\]]*](.*?)(?!.)",re.S | re.MULTILINE)

测试：

您可以使用

re.S

和

re.MULTILINE

使点与换行符匹配，并使用

匹配换行符

然后，匹配两个时间戳之间或一个时间戳和字符串结尾之间的所有内容

regex = re.compile("^\[\d{4}\/\d{2}\/\d{2}[^\]]*](.*?)(?=^\[\d{4}\/\d{2}\/\d{2}[^\]]*])|^\[\d{4}\/\d{2}\/\d{2}[^\]]*](.*?)(?!.)",re.S | re.MULTILINE)

测试：

您可以迭代您的行并检查匹配项-如果找到匹配项，则添加新的日志条目，如果没有，则将该行附加到以前捕获的日志中，即：

LINE_START = re.compile(r"\[\d{4}/\d{2}/\d{2}\s+\d{2}:\d{2}")  # etc.

with open("path/to/your.log", "r") as f:
    log_lines = [next(f)]  # a list to hold the log lines, initiate with the first line
    for line in f:
        if LINE_START.match(line):  # a new log line found
            log_lines.append("")  # 'register' a new log entry
        log_lines[-1] += line  # append the line to the last log entry

您可以迭代行并检查匹配项-如果找到匹配项，则添加新的日志条目，如果没有，则将该行附加到以前捕获的日志中，即：

LINE_START = re.compile(r"\[\d{4}/\d{2}/\d{2}\s+\d{2}:\d{2}")  # etc.

with open("path/to/your.log", "r") as f:
    log_lines = [next(f)]  # a list to hold the log lines, initiate with the first line
    for line in f:
        if LINE_START.match(line):  # a new log line found
            log_lines.append("")  # 'register' a new log entry
        log_lines[-1] += line  # append the line to the last log entry

Try multiline=re.compile（模式，re.multiline）Try multiline=re.compile（模式，re.multiline）