Python py解析和多行系统日志消息_Python_Python 2.7_Syslog_Pyparsing

Python py解析和多行系统日志消息

python python-2.7

Python py解析和多行系统日志消息,python,python-2.7,syslog,pyparsing,Python,Python 2.7,Syslog,Pyparsing,我在这里和那里复制粘贴了一个PyParsing系统日志解析器。这一切都很好，但我有一些系统日志消息看起来不符合“标准”：我需要处理多行系统日志消息。要么我需要将这些Java异常的许多行附加到已经解析的Syslog消息或者将左侧设置为可选我不知道。现在我的实现失败了，因为它假设一个新的应用程序记录了一个新行。那将是。。。通常。。。除非Java > Traceback (most recent call last): File > "/Users/wishi/Pycha

我在这里和那里复制粘贴了一个PyParsing系统日志解析器。这一切都很好，但我有一些系统日志消息看起来不符合“标准”：

我需要处理多行系统日志消息。要么我需要

将这些Java异常的许多行附加到已经解析的Syslog消息
或者将左侧设置为可选

我不知道。现在我的实现失败了，因为它假设一个新的应用程序记录了一个新行。那将是。。。通常。。。除非Java

> Traceback (most recent call last):   File
> "/Users/wishi/PycharmProjects/Sparky_1/syslog_to_spark.py", line 39,
> in <module>
>     main()   File "/Users/wishi/PycharmProjects/Sparky_1/syslog_to_spark.py", line 34,
> in main
>     pattern.runTests(data)   File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 2305, in runTests
>     if comment is not None and comment.matches(t, False) or comments and not t:   File
> "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 2205, in matches
>     self.parseString(_ustr(testString), parseAll=parseAll)   File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 1622, in parseString
>     loc, tokens = self._parse( instring, 0 )   File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 1383, in _parseNoCache
>     loc,tokens = self.parseImpl( instring, preloc, doActions )   File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 2410, in parseImpl
>     if (instring[loc] == self.firstMatchChar and IndexError: string index out of range

它只包含第一行

因此，对于没有换行符的系统日志消息，这是有效的

最简单的解决方案是一次返回解析一行，并将有效的日志行保留在列表中。如果您得到一个有效的日志行，只需将其附加到列表中；如果没有，则将其附加到列表最后一行的“messages”项中

def main():
    valid_log_lines = []
    with open("system.log", "r") as myfile:
        data = myfile.read()
        pattern = Parser()._pattern
        for line in data.splitlines():
            try:
                log_dict = pattern.parse(line)
                if log_dict is None:
                    continue
            except ParseException:
                if valid_log_lines:
                    valid_log_lines[-1]['message'] += '\n' + line
            else:
                valid_log_lines.append(log_dict)

若要加快检测无效行的速度，请尝试添加

timestamp.leaveWhitespace（）

，以便任何不以第1列中的时间戳开头的行都将立即失败

或者您可以修改解析器来处理多行日志消息，这是一个较长的主题

我喜欢您使用

运行测试

，但这更像是一种开发工具；在实际代码中，可能使用

parseString

或类似的代码。

是否需要回溯行？还是跳过它们？我想让它们成为信息的一部分。目前，多行消息解析失败。我开始意识到，在我的循环中，我正在逐行（\n）解析字符串。我的解析器可能不是问题所在。。。

> Traceback (most recent call last):   File
> "/Users/wishi/PycharmProjects/Sparky_1/syslog_to_spark.py", line 39,
> in <module>
>     main()   File "/Users/wishi/PycharmProjects/Sparky_1/syslog_to_spark.py", line 34,
> in main
>     pattern.runTests(data)   File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 2305, in runTests
>     if comment is not None and comment.matches(t, False) or comments and not t:   File
> "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 2205, in matches
>     self.parseString(_ustr(testString), parseAll=parseAll)   File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 1622, in parseString
>     loc, tokens = self._parse( instring, 0 )   File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 1383, in _parseNoCache
>     loc,tokens = self.parseImpl( instring, preloc, doActions )   File "/Users/wishi/anaconda2/envs/sparky/lib/python2.7/site-packages/pyparsing.py",
> line 2410, in parseImpl
>     if (instring[loc] == self.firstMatchChar and IndexError: string index out of range

    from pyparsing import Word, alphas, Suppress, Combine, nums, string, Regex, Optional, ParserElement, LineEnd, OneOrMore, \
    unicodeString, White
import sys
from datetime import datetime


class Parser(object):
    # log lines don't include the year, but if we don't provide one, datetime.strptime will assume 1900
    ASSUMED_YEAR = str(datetime.now().year)

    def __init__(self):
        ints = Word(nums)

        ParserElement.setDefaultWhitespaceChars(" \t")
        NL = Suppress(LineEnd())
        unicodePrintables = u''.join(unichr(c) for c in xrange(sys.maxunicode)
                                     if not unichr(c).isspace())

        # priority
        # priority = Suppress("<") + ints + Suppress(">")

        # timestamp
        month = Word(string.ascii_uppercase, string.ascii_lowercase, exact=3)
        day = ints
        hour = Combine(ints + ":" + ints + ":" + ints)

        timestamp = month + day + hour
        # a parse action will convert this timestamp to a datetime
        timestamp.setParseAction(
            lambda t: datetime.strptime(Parser.ASSUMED_YEAR + ' ' + ' '.join(t), '%Y %b %d %H:%M:%S'))

        # hostname
        # usually hostnames follow some convention
        hostname = Word(alphas + nums + "_-.")

        # appname
        # if you call your app "my big fat app with a very long name" go away
        appname = (Word(alphas + nums + "/-_.()") + Optional(Word(" ")) + Optional(Word(alphas + nums + "/-_.()")))(
            "appname") + (Suppress("[") + ints("pid") + Suppress("]")) | (Word(alphas + "/-_.")("appname"))
        appname.setName("appname")

        # message
        # supports messages with printed unicode
        message = Combine(OneOrMore(Word(unicodePrintables) | OneOrMore("\t") | OneOrMore(" "))) +  Suppress(OneOrMore(NL))
        messages = OneOrMore(message) # does not work

        # pattern build
        # (add results names to make it easier to access parsed fields)
        self._pattern = timestamp("timestamp") + hostname("hostname") + Optional(appname) + Optional(Suppress(':')) + messages("message")

    def parse(self, line):
        if line.strip():
            parsed = self._pattern.parseString(line)
            return parsed.asDict()

[datetime.datetime(2018, 4, 2, 9, 23, 9), 'dawn', 'Java', 'App', '537', '[main] ERROR ch.databin.core.Verifier - Unknown validation error']
- appname: ['Java', 'App']
- hostname: 'dawn'
- message: '[main] ERROR ch.databin.core.Verifier - Unknown validation error'
- pid: '537'
- timestamp: datetime.datetime(2018, 4, 2, 9, 23, 9)

def main():
    valid_log_lines = []
    with open("system.log", "r") as myfile:
        data = myfile.read()
        pattern = Parser()._pattern
        for line in data.splitlines():
            try:
                log_dict = pattern.parse(line)
                if log_dict is None:
                    continue
            except ParseException:
                if valid_log_lines:
                    valid_log_lines[-1]['message'] += '\n' + line
            else:
                valid_log_lines.append(log_dict)