正则表达式在Python中添加引号,以便返回Python字典
使用Python和正则表达式,我想给每个单词加上引号。目前,我只能在第一个索引中添加引号。当我循环我的最终结果时,我得到一个字符串。相反,我想要Python字典。为了解决这个问题,我认为添加引号将帮助我获得字典而不是字符串。有人能指引我吗 代码正则表达式在Python中添加引号,以便返回Python字典,python,regex,for-loop,dictionary,Python,Regex,For Loop,Dictionary,使用Python和正则表达式,我想给每个单词加上引号。目前,我只能在第一个索引中添加引号。当我循环我的最终结果时,我得到一个字符串。相反,我想要Python字典。为了解决这个问题,我认为添加引号将帮助我获得字典而不是字符串。有人能指引我吗 代码 raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: '
raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"
line_re = re.compile(r'\{[^\}]+\}')
records = line_re.findall(raw)
record_re = re.compile(
r"""
id:\s*\'(?P<id>[^']+)\'\s*
startdate:\s*(?P<startdate>\d+)\s*
numvaluelist:\s*(?P<numvaluelist>[\d\.]+)\s*
datelist:\s*(?P<datelist>\d+)\s*
""",
re.X
)
record_parsed = record_re.search(line_re.findall(raw)[0])
record_parsed.groupdict()
# {'startdate': '42521', 'numvaluelist': '0.1065599566767107', 'datelist': '42521', 'id': 'A(long) 11A'}
for record in records:
record_parsed = record_re.search(record)
print type(record)
所需输出引号中的所有内容
{'id': 'A(long) 11A' 'startdate': '42521' 'numvaluelist': '0.1065599566767107' 'datelist': '42521'}
{'id': 'A(short) 11B' 'startdate': '42521' 'numvaluelist': '0.0038113334533441123' 'datelist': '42521' }
{'id': 'B(long) 11C' 'startdate': '42521' 'numvaluelist': '20.061623176440904' 'datelist': '42521'}
这看起来像是一张照片。您的最终目标是将文本数据解析到Python字典中。添加引号是实现这一点的方法(假设您计划使用eval()
来解析它),但这是一条漫长的路
相反,直接解析它。你甚至不需要一个正则表达式,而且它更清楚你在做什么。这是一个快速而肮脏的尝试
from collections import OrderedDict
raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"
record = OrderedDict()
records = []
tokens = iter(raw.split())
previous_token = ""
for token in tokens:
if previous_token == "{id:":
record["id"] = token.lstrip("'")
# get the rest of the ID up to closing quote
for token in tokens:
record["id"] += " " + token
if token.endswith("'"):
record["id"] = record["id"].rstrip("'")
break
elif previous_token in ("startdate:", "numvaluelist:"):
record["numvaluelist"] = token
elif previous_token == "datelist:":
record["datelist"] = token.partition("}")[0]
# record is complete; start new one
records.append(record)
record = OrderedDict()
previous_token = token
一旦您将其作为Python数据,您当然可以以任何方式打印它。。。为了好玩,包括您要求的格式:
for record in records:
print("{%s}" % ", ".join(repr(k) + ": " + repr(record[k]) for k in record))
这里有一种使用正则表达式解析它的方法。如果您将每个部分都与原始部分完全匹配,那么可能有一种更通用的方法
import re
raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"
line_re = re.compile(r'\{[^\}]+\}')
value_re = re.compile(r"(\w+): ('[^']*'|\S+)")
data = []
lines = line_re.findall(raw)
for line in lines:
data_line = dict()
values = re.findall(value_re, line)
for (name, value) in values:
if(value[-1] == '}'): value = value[:-1] # to handle "foo}" without space
if(value[:1] == "'"): value = value[1:-1] # strip quotes
data_line[name] = value
data.append(data_line)
print data
找到数据的来源,并请他们提供json序列化OrderedDict的良好使用。我认为最好保留顺序以防万一。
import re
raw = "[historic_list {id: 'A(long) 11A' startdate: 42521 numvaluelist: 0.1065599566767107 datelist: 42521}historic_list {id: 'A(short) 11B' startdate: 42521 numvaluelist: 0.0038113334533441123 datelist: 42521 }historic_list {id: 'B(long) 11C' startdate: 42521 numvaluelist: 20.061623176440904 datelist: 42521}time_statistics {job_id: '' portfolio_id: '112341'} UrlPairList {}]"
line_re = re.compile(r'\{[^\}]+\}')
value_re = re.compile(r"(\w+): ('[^']*'|\S+)")
data = []
lines = line_re.findall(raw)
for line in lines:
data_line = dict()
values = re.findall(value_re, line)
for (name, value) in values:
if(value[-1] == '}'): value = value[:-1] # to handle "foo}" without space
if(value[:1] == "'"): value = value[1:-1] # strip quotes
data_line[name] = value
data.append(data_line)
print data