Python3-将变量导入字典
我正在尝试将下面的print命令的输出输入到字典中(未成功),以便随后将其导出到CSV 如何将Python3-将变量导入字典,python,python-3.x,dictionary,web-scraping,yaml,Python,Python 3.x,Dictionary,Web Scraping,Yaml,我正在尝试将下面的print命令的输出输入到字典中(未成功),以便随后将其导出到CSV 如何将parseddata(以下打印输出)输入词典 示例输入文件: <html> <body> <p>{ success:true ,results:3,rows:[{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consoli
parseddata
(以下打印输出)输入词典
示例输入文件:
<html>
<body>
<p>{ success:true ,results:3,rows:[{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"14-Aug-2015 15:39",SeqNumber:"1001577"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"30-May-2015 14:37",SeqNumber:"129901"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"17-Feb-2015 14:57",SeqNumber:"126171"}]}</p>
</body>
</html>
print(parseddata)
的输出为:
{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"14-Aug-2015 15:39",SeqNumber:"1001577"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"30-May-2015 14:37",SeqNumber:"129901"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"17-Feb-2015 14:57",SeqNumber:"126171"}]}
除了结尾的括号外,这是有效的JSON这是有效的YAML(我在最初的回答中犯了一个错误;JavaScript对象可以在不引用属性的情况下声明,但JSON可移植格式不允许这样做;YAML允许这样做) 按照说明使用
PyYAML
解析数据。手册split
-ing和lstrip
正在伤害您,并使之变得更加困难。只需获取文本
,然后使用yaml
进行解析(这是必须单独安装的第三方模块):
您可以阅读有关的详细信息。这看起来像一个键值映射,带有
ISIN
一个键和“INE134E01011”
一个值。但它不是JSON,因为键没有引号,也不是YAML,因为纯标量键(即没有引号的字符串必须是(:
)
如果您将输出字符串拆分为“”部分:
test_str = (
'{ISIN:"INE134E01011",Ind:"-",'
'Audited:"Un-Audited",'
'Cumulative:"Non-cumulative",'
'Consolidated:"Non-Consolidated",'
'FilingDate:"14-Aug-2015 15:39",'
'SeqNumber:"1001577"},'
'{ISIN:"INE134E01011",' # new mapping starts
'Ind:"-",'
'Audited:"Un-Audited",'
'Cumulative:"Non-cumulative",'
'Consolidated:"Non-Consolidated",'
'FilingDate:"30-May-2015 14:37",'
'SeqNumber:"129901"},'
'{ISIN:"INE134E01011",' # new mapping starts
'Ind:"-",'
'Audited:"Un-Audited",'
'Cumulative:"Non-cumulative",'
'Consolidated:"Non-Consolidated",'
'FilingDate:"17-Feb-2015 14:57",'
'SeqNumber:"126171"}]}'
)
它与您的输入相同:
test_org = '{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"14-Aug-2015 15:39",SeqNumber:"1001577"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"30-May-2015 14:37",SeqNumber:"129901"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"17-Feb-2015 14:57",SeqNumber:"126171"}]}'
assert test_str == test_org
拆分表明实际上有3个映射,后面有一个]
和}
。]
表示有一个列表,这与用逗号分隔3个映射是一致的。匹配的[
丢失了,因为您在上拆分之后:['
,您将lstrip()
您可以轻松地操作字符串,以便YAML可以对其进行解析,但结果是一个列表:
import ruamel.yaml
test_str = '[' + test_str.replace(':"', ': "').rstrip('}')
data = ruamel.yaml.load(test_str)
print(type(data))
印刷品:
<class 'list'>
但是,如果您的最终目标是CSV文件,我看不到从列表到dict的理由。如果您从YAML解析器获取输出,您可以执行以下操作:
import csv
with open('output.csv', 'w', newline='') as fp:
csvwriter = csv.writer(fp)
csvwriter.writerow(data[0].keys()) # header of common dict keys
for elem in data:
csvwriter.writerow(elem.values()) # values
要获取包含以下内容的CSV文件:
ISIN,Ind,Consolidated,Cumulative,Audited,FilingDate
INE134E01011,-,Non-Consolidated,Non-cumulative,Un-Audited,14-Aug-2015 15:39
INE134E01011,-,Non-Consolidated,Non-cumulative,Un-Audited,30-May-2015 14:37
INE134E01011,-,Non-Consolidated,Non-cumulative,Un-Audited,17-Feb-2015 14:57
我没有用\
转义换行符,而是用括号将多行定义变成一个字符串,这样我就可以更容易地在行上添加注释
²而不是重新添加[”,你当然不应该一开始就把它去掉但是parseddata
是什么样子的呢?尤里布,我编辑了这篇文章来展示parseddata的样子。thanks@zs_python:您能否提供一个要处理的示例输入文件,以便人们可以对其运行测试用例。示例输入文件已在上述问题中添加,谢谢Hanks ShadowRanger,我想问题出在“结尾处的散乱的近括号/括号”上,我该如何摆脱它呢?@zs_python:预料到了这一点,并在您询问之前添加了一个示例。:-)很有可能,原始数据是有效的json
,只要您感兴趣的对象是只有一个属性的对象(持有一个元素数组)的数组属性中的唯一条目。您可能只需json。加载整个内容,然后访问并分配数据\u as\u dict=whole\u thing\u as\u dict['name_of_singleton_key'][0]
并避免显式的拆分和lstrip
-ing。感谢您帮助删除迷路的ShadowRanger。上面的示例向我抛出了一个错误:JSONDecodeError:期望属性名称包含在双引号中:第1行第2列(字符1)我刚刚发布了问题中的示例输入文件,以便更清楚地了解我正在试图解析的内容。谢谢Anthon。那太完美了,只是为我做了准确的工作!非常感谢您为我解释它所做的所有努力。谢谢@ShadowRanger,您的努力增加了我的python学习,非常有帮助同样,NoOB也被你们投入到帮助我学习的努力所压倒。谢谢,向前!@ ZSyPython,如果这解决了你的问题,请考虑接受答案(点击这个答案旁边的标记)。这表明你的问题已经解决了(他们可能不会一直读到你的评论)。,并将其标记在数据库中。感谢@anthon的帮助,我已接受了指导答案。回头见:)
ddata = {}
for elem in data:
k = elem.pop('SeqNumber')
ddata[k] = elem
import csv
with open('output.csv', 'w', newline='') as fp:
csvwriter = csv.writer(fp)
csvwriter.writerow(data[0].keys()) # header of common dict keys
for elem in data:
csvwriter.writerow(elem.values()) # values
ISIN,Ind,Consolidated,Cumulative,Audited,FilingDate
INE134E01011,-,Non-Consolidated,Non-cumulative,Un-Audited,14-Aug-2015 15:39
INE134E01011,-,Non-Consolidated,Non-cumulative,Un-Audited,30-May-2015 14:37
INE134E01011,-,Non-Consolidated,Non-cumulative,Un-Audited,17-Feb-2015 14:57