使用递归和正则表达式将字符串转换为Python字典

使用递归和正则表达式将字符串转换为Python字典,python,regex,dictionary,Python,Regex,Dictionary,我需要把字符串转换成字典。更具体地说,我需要将审核过的消息解析到字典中。前任: 字符串: 以下是一些备选方案: msg=audit(1234902.147:88): pid=254 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12 30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/

我需要把字符串转换成字典。更具体地说,我需要将审核过的消息解析到字典中。前任: 字符串:

以下是一些备选方案:

msg=audit(1234902.147:88): pid=254 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12    30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/grep" hostname=? a    ddr=? terminal=/dev/pts/0 res=success'

msg=audit(432787023.324:77): pid=1254 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12    30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/tail" hostname=? a    ddr=? terminal=/dev/pts/0 res=success'
我想要的是:

{
  msg: 'audit(...',
  pid: ...,
  uid: ...,
  mess: {
    op: PAM...,
    grantors=pam_unix...
  }
}
我知道我需要一个正则表达式,它需要递归,但我非常感谢您的帮助。

好了(借助一些正则表达式):


请参阅。

这里有一种可能性,但在此过程中没有杀死任何正则表达式:

import shlex
from collections import OrderedDict

def split_on_equals_to_dict(string_to_split):
    split_dict = OrderedDict()
    for i, item in enumerate(shlex.split(string_to_split)):
        number_of_equals = item.count('=')
        if number_of_equals == 0:
            split_dict[item] = None
        elif number_of_equals == 1:
            split_dict.update(dict([item.split('=')]))
        else:
            tag, value = tuple(item.split('=', 1))
            split_dict[tag] = split_on_equals_to_dict(value)
    return split_dict

log_str="""audit(123.123:123): pid=2514 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12    30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/sudo" hostname=? a    ddr=? terminal=/dev/pts/0 res=success'"""
log_dict = split_on_equals_to_dict(log_str)

提供的字符串中存在一些歧义。我通过使用OrderedDict解决了这个问题。

Nice。非常感谢你,先生。在覆盖第一个msg值(带有123.123:123的值)之前,我正在处理for循环。我正在重构,但任何帮助都将不胜感激。每一个都是一个单独的条目
import re

string = """
msg=audit(1234902.147:88): pid=254 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12    30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/grep" hostname=? a    ddr=? terminal=/dev/pts/0 res=success'

msg=audit(432787023.324:77): pid=1254 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12    30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/tail" hostname=? a    ddr=? terminal=/dev/pts/0 res=success'
"""

# lines regex
entries = re.compile(r'^msg=.+', re.MULTILINE)

# outer regex
rx = re.compile("""
    ((\w+)='([^']+)') # longer group
    |             # or
    (\w+=\S+)     # single items
    """, re.VERBOSE)

# inner regex
ry = re.compile("(\w+)=(\S+)")

for entry in entries.finditer(string):
  result = dict()
  for match in rx.finditer(entry.group(0)):
    try:
      key, value = match.group(4).split('=')
      result[key] = value
    except:
      #key = match.group(2)

      inner = dict()
      for m in ry.finditer(match.group(3)):
        inner[m.group(1)] = m.group(2)

      result["mess"] = inner

  print(result)
import shlex
from collections import OrderedDict

def split_on_equals_to_dict(string_to_split):
    split_dict = OrderedDict()
    for i, item in enumerate(shlex.split(string_to_split)):
        number_of_equals = item.count('=')
        if number_of_equals == 0:
            split_dict[item] = None
        elif number_of_equals == 1:
            split_dict.update(dict([item.split('=')]))
        else:
            tag, value = tuple(item.split('=', 1))
            split_dict[tag] = split_on_equals_to_dict(value)
    return split_dict

log_str="""audit(123.123:123): pid=2514 uid=1000 auid=1000 ses=3 subj=random_ex:random_ex:random_ex:d3-d3:w0.c12    30 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="lemoney" exe="/usr/bin/sudo" hostname=? a    ddr=? terminal=/dev/pts/0 res=success'"""
log_dict = split_on_equals_to_dict(log_str)