Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
正则表达式在Python中添加引号,以便返回Python字典_Python_Regex_For Loop_Dictionary - Fatal编程技术网

正则表达式在Python中添加引号,以便返回Python字典

正则表达式在Python中添加引号,以便返回Python字典,python,regex,for-loop,dictionary,Python,Regex,For Loop,Dictionary,使用Python和正则表达式,我想给每个单词加上引号。目前,我只能在第一个索引中添加引号。当我循环我的最终结果时,我得到一个字符串。相反,我想要Python字典。为了解决这个问题,我认为添加引号将帮助我获得字典而不是字符串。有人能指引我吗 代码 raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: '

使用Python和正则表达式,我想给每个单词加上引号。目前,我只能在第一个索引中添加引号。当我循环我的最终结果时,我得到一个字符串。相反,我想要Python字典。为了解决这个问题,我认为添加引号将帮助我获得字典而不是字符串。有人能指引我吗

代码

raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"

line_re = re.compile(r'\{[^\}]+\}')
records = line_re.findall(raw)

record_re = re.compile(
    r"""
            id:\s*\'(?P<id>[^']+)\'\s*
            startdate:\s*(?P<startdate>\d+)\s*
            numvaluelist:\s*(?P<numvaluelist>[\d\.]+)\s*
            datelist:\s*(?P<datelist>\d+)\s*
            """,
    re.X
    )

record_parsed = record_re.search(line_re.findall(raw)[0])
record_parsed.groupdict()
# {'startdate': '42521', 'numvaluelist': '0.1065599566767107', 'datelist': '42521', 'id': 'A(long) 11A'}

for record in records:
    record_parsed = record_re.search(record)
    print type(record)
所需输出引号中的所有内容

{'id': 'A(long) 11A' 'startdate': '42521' 'numvaluelist': '0.1065599566767107' 'datelist': '42521'}
{'id': 'A(short) 11B' 'startdate': '42521' 'numvaluelist': '0.0038113334533441123' 'datelist': '42521' }
{'id': 'B(long) 11C' 'startdate': '42521' 'numvaluelist': '20.061623176440904' 'datelist': '42521'}
这看起来像是一张照片。您的最终目标是将文本数据解析到Python字典中。添加引号是实现这一点的方法(假设您计划使用
eval()
来解析它),但这是一条漫长的路

相反,直接解析它。你甚至不需要一个正则表达式,而且它更清楚你在做什么。这是一个快速而肮脏的尝试

from collections import OrderedDict

raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"

record = OrderedDict()
records = []

tokens = iter(raw.split())
previous_token = ""

for token in tokens:
    if previous_token == "{id:":
        record["id"] = token.lstrip("'")
        # get the rest of the ID up to closing quote
        for token in tokens:
            record["id"] += " " + token
            if token.endswith("'"):
                record["id"] = record["id"].rstrip("'")
                break
    elif previous_token in ("startdate:", "numvaluelist:"):
        record["numvaluelist"] = token
    elif previous_token == "datelist:":
        record["datelist"] = token.partition("}")[0]
        # record is complete; start new one
        records.append(record)
        record = OrderedDict()
    previous_token = token
一旦您将其作为Python数据,您当然可以以任何方式打印它。。。为了好玩,包括您要求的格式:

for record in records:
    print("{%s}" % ", ".join(repr(k) + ": " + repr(record[k]) for k in record))

这里有一种使用正则表达式解析它的方法。如果您将每个部分都与原始部分完全匹配,那么可能有一种更通用的方法

import re
raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"

line_re = re.compile(r'\{[^\}]+\}')
value_re = re.compile(r"(\w+): ('[^']*'|\S+)")

data = []
lines = line_re.findall(raw)
for line in lines:
    data_line = dict()
    values = re.findall(value_re, line)
    for (name, value) in values:
        if(value[-1] == '}'): value = value[:-1]  # to handle "foo}" without space
        if(value[:1] == "'"): value = value[1:-1]  # strip quotes
        data_line[name] = value
    data.append(data_line)

print data

找到数据的来源,并请他们提供json序列化OrderedDict的良好使用。我认为最好保留顺序以防万一。
import re
raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"

line_re = re.compile(r'\{[^\}]+\}')
value_re = re.compile(r"(\w+): ('[^']*'|\S+)")

data = []
lines = line_re.findall(raw)
for line in lines:
    data_line = dict()
    values = re.findall(value_re, line)
    for (name, value) in values:
        if(value[-1] == '}'): value = value[:-1]  # to handle "foo}" without space
        if(value[:1] == "'"): value = value[1:-1]  # strip quotes
        data_line[name] = value
    data.append(data_line)

print data