正则表达式在Python中添加引号，以便返回Python字典_Python_Regex_For Loop_Dictionary

正则表达式在Python中添加引号，以便返回Python字典

python regex for-loop dictionary

正则表达式在Python中添加引号，以便返回Python字典,python,regex,for-loop,dictionary,Python,Regex,For Loop,Dictionary,使用Python和正则表达式，我想给每个单词加上引号。目前，我只能在第一个索引中添加引号。当我循环我的最终结果时，我得到一个字符串。相反，我想要Python字典。为了解决这个问题，我认为添加引号将帮助我获得字典而不是字符串。有人能指引我吗代码 raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: '

使用Python和正则表达式，我想给每个单词加上引号。目前，我只能在第一个索引中添加引号。当我循环我的最终结果时，我得到一个字符串。相反，我想要Python字典。为了解决这个问题，我认为添加引号将帮助我获得字典而不是字符串。有人能指引我吗

代码

raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"

line_re = re.compile(r'\{[^\}]+\}')
records = line_re.findall(raw)

record_re = re.compile(
    r"""
            id:\s*\'(?P<id>[^']+)\'\s*
            startdate:\s*(?P<startdate>\d+)\s*
            numvaluelist:\s*(?P<numvaluelist>[\d\.]+)\s*
            datelist:\s*(?P<datelist>\d+)\s*
            """,
    re.X
    )

record_parsed = record_re.search(line_re.findall(raw)[0])
record_parsed.groupdict()
# {'startdate': '42521', 'numvaluelist': '0.1065599566767107', 'datelist': '42521', 'id': 'A(long) 11A'}

for record in records:
    record_parsed = record_re.search(record)
    print type(record)

所需输出引号中的所有内容

{'id': 'A(long) 11A' 'startdate': '42521' 'numvaluelist': '0.1065599566767107' 'datelist': '42521'}
{'id': 'A(short) 11B' 'startdate': '42521' 'numvaluelist': '0.0038113334533441123' 'datelist': '42521' }
{'id': 'B(long) 11C' 'startdate': '42521' 'numvaluelist': '20.061623176440904' 'datelist': '42521'}

这看起来像是一张照片。您的最终目标是将文本数据解析到Python字典中。添加引号是实现这一点的方法（假设您计划使用

eval（）

来解析它），但这是一条漫长的路

相反，直接解析它。你甚至不需要一个正则表达式，而且它更清楚你在做什么。这是一个快速而肮脏的尝试

from collections import OrderedDict

raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"

record = OrderedDict()
records = []

tokens = iter(raw.split())
previous_token = ""

for token in tokens:
    if previous_token == "{id:":
        record["id"] = token.lstrip("'")
        # get the rest of the ID up to closing quote
        for token in tokens:
            record["id"] += " " + token
            if token.endswith("'"):
                record["id"] = record["id"].rstrip("'")
                break
    elif previous_token in ("startdate:", "numvaluelist:"):
        record["numvaluelist"] = token
    elif previous_token == "datelist:":
        record["datelist"] = token.partition("}")[0]
        # record is complete; start new one
        records.append(record)
        record = OrderedDict()
    previous_token = token

一旦您将其作为Python数据，您当然可以以任何方式打印它。。。为了好玩，包括您要求的格式：

for record in records:
    print("{%s}" % ", ".join(repr(k) + ": " + repr(record[k]) for k in record))

这里有一种使用正则表达式解析它的方法。如果您将每个部分都与原始部分完全匹配，那么可能有一种更通用的方法

import re
raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"

line_re = re.compile(r'\{[^\}]+\}')
value_re = re.compile(r"(\w+): ('[^']*'|\S+)")

data = []
lines = line_re.findall(raw)
for line in lines:
    data_line = dict()
    values = re.findall(value_re, line)
    for (name, value) in values:
        if(value[-1] == '}'): value = value[:-1]  # to handle "foo}" without space
        if(value[:1] == "'"): value = value[1:-1]  # strip quotes
        data_line[name] = value
    data.append(data_line)

print data

找到数据的来源，并请他们提供json序列化OrderedDict的良好使用。我认为最好保留顺序以防万一。

import re
raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"

line_re = re.compile(r'\{[^\}]+\}')
value_re = re.compile(r"(\w+): ('[^']*'|\S+)")

data = []
lines = line_re.findall(raw)
for line in lines:
    data_line = dict()
    values = re.findall(value_re, line)
    for (name, value) in values:
        if(value[-1] == '}'): value = value[:-1]  # to handle "foo}" without space
        if(value[:1] == "'"): value = value[1:-1]  # strip quotes
        data_line[name] = value
    data.append(data_line)

print data