如何将不带引号、带数组和子字典的字符串转换为python中的字典?

如何将不带引号、带数组和子字典的字符串转换为python中的字典?,python,Python,下一个字符串没有引号,有数组和子字典: s ='{source: [s3, kinesis], aws_access_key_id: {myaws1, myaws2}, aws_secret_access_key: REDACTED_POSSIBLE_AWS_SECRET_ACCESS_KEY, bucketName: bucket, region_name: eu-west-1, fileType: zip, typeIngestion: FULL, project: trackingcamp

下一个字符串没有引号,有数组和子字典:

s ='{source: [s3, kinesis], aws_access_key_id: {myaws1, myaws2}, aws_secret_access_key: REDACTED_POSSIBLE_AWS_SECRET_ACCESS_KEY, bucketName: bucket, region_name: eu-west-1, fileType: zip, typeIngestion: FULL, project: trackingcampaigns, functionalArea: client, filePaths: [s3Sensor/2018/], prefixFiles: [Tracking_Sent, Tracking_Bounces, Tracking_Opens, Tracking_Clicks, Tracking_SendJobs], prefixToTables: {Tracking_Bounces: MNG_TRACKING_EXTRACT_BOUNCES_3, Tracking_Sent: MNG_TRACKING_EXTRACT_SENT_3, Tracking_Clicks: MNG_TRACKING_EXTRACT_CLICKS_3, Tracking_Opens: MNG_TRACKING_EXTRACT_OPENS_3, Tracking_SendJobs: MNG_TRACKING_EXTRACT_SENDJOBS_3}, stagingPath: /zipFiles/}'

我想将其转换为字典。

未测试,但您可以执行一些字符串清理,并将键和值拆分为字典:

s ='{source: s3, aws_access_key_id: myaws, aws_secret_access_key: REDACTED_POSSIBLE_AWS_SECRET_ACCESS_KEY, bucketName: bucket, region_name: eu-west-1, fileType: zip, typeIngestion: FULL, project: trackingcampaigns, functionalArea: client, filePaths: [s3Sensor/2018/], prefixFiles: [Tracking_Sent, Tracking_Bounces, Tracking_Opens, Tracking_Clicks, Tracking_SendJobs], prefixToTables: {Tracking_Bounces: MNG_TRACKING_EXTRACT_BOUNCES_3, Tracking_Sent: MNG_TRACKING_EXTRACT_SENT_3, Tracking_Clicks: MNG_TRACKING_EXTRACT_CLICKS_3, Tracking_Opens: MNG_TRACKING_EXTRACT_OPENS_3, Tracking_SendJobs: MNG_TRACKING_EXTRACT_SENDJOBS_3}, stagingPath: /zipFiles/}'

s = s[1:-1]
data = {i.split(': ')[0]: i.split(': ')[1] for i in s.split(', ')}


在尝试求值之前,可以使用正则表达式处理要添加到引号中的字符串:

import re
import ast

s = "{source: s3, aws_access_key_id: myaws, aws_secret_access_key: REDACTED_POSSIBLE_AWS_SECRET_ACCESS_KEY, bucketName: bucket, region_name: eu-west-1, fileType: zip, typeIngestion: FULL, project: trackingcampaigns, functionalArea: client, filePaths: [s3Sensor/2018/], prefixFiles: [Tracking_Sent, Tracking_Bounces, Tracking_Opens, Tracking_Clicks, Tracking_SendJobs], prefixToTables: {Tracking_Bounces: MNG_TRACKING_EXTRACT_BOUNCES_3, Tracking_Sent: MNG_TRACKING_EXTRACT_SENT_3, Tracking_Clicks: MNG_TRACKING_EXTRACT_CLICKS_3, Tracking_Opens: MNG_TRACKING_EXTRACT_OPENS_3, Tracking_SendJobs: MNG_TRACKING_EXTRACT_SENDJOBS_3}, stagingPath: /zipFiles/}"

s = re.sub(r':\s?(?![{\[\s])([^,}]+)', r': "\1"', s) #Add quotes to dict values
s = re.sub(r'(\w+):', r'"\1":', s) #Add quotes to dict keys

def add_quotes_to_lists(match):
    return re.sub(r'([\s\[])([^\],]+)', r'\1"\2"', match.group(0))

s = re.sub(r'\[[^\]]+', add_quotes_to_lists, s) #Add quotes to list items

final = ast.literal_eval(s) #Evaluate the dictionary

print(final)

不是最漂亮的解决方案,我只有一个输入示例,因此我无法保证此解决方案的健壮性,但它适用于提供的示例。

我认为仅使用内置模块以健壮的方式实现这一点并不容易,因此这里有一个利用。我以这个例子为例,对它进行了修改,以识别不使用引号的字符串,并为
{myaws1,myaws2}
值添加了一个集合文字

import pyparsing as pp
from pyparsing import pyparsing_common as ppc

def make_keyword(kwd_str, kwd_value):
    return pp.Keyword(kwd_str).setParseAction(pp.replaceWith(kwd_value))
TRUE  = make_keyword("true", True)
FALSE = make_keyword("false", False)
NULL  = make_keyword("null", None)

LBRACK, RBRACK, LBRACE, RBRACE, COLON = map(pp.Suppress, "[]{}:")

jsonString = pp.OneOrMore(pp.CharsNotIn('{}[]:,')).setParseAction(lambda s, l, t: [t[0].strip()])
jsonNumber = ppc.number()

jsonObject = pp.Forward()
jsonValue = pp.Forward()
jsonElements = pp.delimitedList( jsonValue )
jsonArray = pp.Group(LBRACK + pp.Optional(jsonElements, []) + RBRACK)
jsonSet = pp.Group(LBRACE + pp.Optional(jsonElements, []) + RBRACE).setParseAction(lambda s,l,t: set(t[0]))
jsonValue << (jsonNumber | jsonString | pp.Group(jsonObject)  | jsonArray | jsonSet | TRUE | FALSE | NULL)
memberDef = pp.Group(jsonString + COLON + jsonValue)
jsonMembers = pp.delimitedList(memberDef)
jsonObject << pp.Dict(LBRACE + pp.Optional(jsonMembers) + RBRACE)

jsonComment = pp.cppStyleComment
jsonObject.ignore(jsonComment)


if __name__ == "__main__":
    s ='{source: [s3, kinesis], aws_access_key_id: {myaws1, myaws2}, aws_secret_access_key: REDACTED_POSSIBLE_AWS_SECRET_ACCESS_KEY, bucketName: bucket, region_name: eu-west-1, fileType: zip, typeIngestion: FULL, project: trackingcampaigns, functionalArea: client, filePaths: [s3Sensor/2018/], prefixFiles: [Tracking_Sent, Tracking_Bounces, Tracking_Opens, Tracking_Clicks, Tracking_SendJobs], prefixToTables: {Tracking_Bounces: MNG_TRACKING_EXTRACT_BOUNCES_3, Tracking_Sent: MNG_TRACKING_EXTRACT_SENT_3, Tracking_Clicks: MNG_TRACKING_EXTRACT_CLICKS_3, Tracking_Opens: MNG_TRACKING_EXTRACT_OPENS_3, Tracking_SendJobs: MNG_TRACKING_EXTRACT_SENDJOBS_3}, stagingPath: /zipFiles/}'

    results = jsonObject.parseString(s)
    print(results.asDict())

在准备字符串的
json
表示之前,可以使用正则表达式格式化字符串

import re
json_string = {}
pattern = re.compile(r'(\w*):\s(\w*)')
matches = re.finditer(pattern, s)
for match in matches:
    json_string[match.group(1)] = match.group(2)
print(json_string)
字符串显示一种模式,其中
json
字符串的
值对被一个
后跟一个
空格
分隔
\w*
匹配任何数量的字符串字符(除数字外),
\s
帮助您检测空白。
finditer
方法返回一个iterable,供您循环并获取模式中的组。您可以阅读有关组ID的更多信息

可能重复的@Ahndwoo。在我的案例中,它不是重复的,没有引号。为了计算字符串,您需要在键和字符串值上加引号。您需要使用分隔符来解析字符串,如
,以将字符串解析为dictionary@Veilkrand这不是真的。我需要做一些类似的事情@EricBellet检查我的答案这是我与你分享的帖子的答案,我将数组和子字典作为值,因此解决方案不起作用
import re
json_string = {}
pattern = re.compile(r'(\w*):\s(\w*)')
matches = re.finditer(pattern, s)
for match in matches:
    json_string[match.group(1)] = match.group(2)
print(json_string)