在Python中解析JSON嵌套数组,保留到JSON对象的映射
我有一个大JSON文件,其结构如下:在Python中解析JSON嵌套数组,保留到JSON对象的映射,python,json,pandas,jsonparser,Python,Json,Pandas,Jsonparser,我有一个大JSON文件,其结构如下: { "Project": { "AAA": { "Version": [ { "id": "00001", "name": "08.12.2019", "description": null, "released"
{
"Project": {
"AAA": {
"Version": [
{
"id": "00001",
"name": "08.12.2019",
"description": null,
"released": true,
"releaseDate": "2019-08-12"
},
{
"id": "00002",
"name": "2019.8.26",
"description": null,
"released": true,
"releaseDate": "2019-08-26"
}
]
},
"BBB": {
"Version": [
{
"id": "00003",
"name": "AABBY3",
"description": "2019 Accounting Year End",
"released": false,
"releaseDate": null
},
{
"id": "00004",
"name": "AACCZ4",
"description": "Financial Statements 2019",
"released": false,
"releaseDate": null
},
{
"id": "00005",
"name": "AADDZ5",
"description": null,
"released": false,
"releaseDate": null
}
]
}
}
}
df.head(3)
Out[10]:
description id name releaseDate released
0 Version 5.4.1. 10703 V5R4M1 2010-09-15 True
1 Version 5.5.1 10704 V5R5M1 2015-04-20 True
2 Version 6.1.1 10705 V6R1M1 2016-10-14 True
由于嵌套数组,我在将其转换为Python数据帧时遇到问题。对于每个项目
,如何提取每个版本
中的所有数据,但保持对项目
的引用
到目前为止,我只获得了以下结构的数据帧:
{
"Project": {
"AAA": {
"Version": [
{
"id": "00001",
"name": "08.12.2019",
"description": null,
"released": true,
"releaseDate": "2019-08-12"
},
{
"id": "00002",
"name": "2019.8.26",
"description": null,
"released": true,
"releaseDate": "2019-08-26"
}
]
},
"BBB": {
"Version": [
{
"id": "00003",
"name": "AABBY3",
"description": "2019 Accounting Year End",
"released": false,
"releaseDate": null
},
{
"id": "00004",
"name": "AACCZ4",
"description": "Financial Statements 2019",
"released": false,
"releaseDate": null
},
{
"id": "00005",
"name": "AADDZ5",
"description": null,
"released": false,
"releaseDate": null
}
]
}
}
}
df.head(3)
Out[10]:
description id name releaseDate released
0 Version 5.4.1. 10703 V5R4M1 2010-09-15 True
1 Version 5.5.1 10704 V5R5M1 2015-04-20 True
2 Version 6.1.1 10705 V6R1M1 2016-10-14 True
使用以下命令:
with open("fixVer2.json", "r") as read_file:
data = json.load(read_file)
prj_list = ['AAA', 'BBB', 'CCC', 'DDD']
d_list = []
for x in prj_list:
d = data['Project'][x]['Version']
for el in d:
d_list.append(el)
df = pd.DataFrame(d_list)
但是,由于不同发布日期的项目之间存在重复的名称
,我需要保留项目
名称,以便为每个名称
识别正确的发布日期
期望输出:
description id name releaseDate released Project
0 Version 5.4.1. 10703 V5R4M1 2010-09-15 True CCC
1 Version 5.5.1 10704 V5R5M1 2015-04-20 True CCC
2 Version 6.1.1 10705 V6R1M1 2016-10-14 True CCC
我不确定如何解析嵌套数组,保留
项目
名称详细信息,并将其整合到一个数据帧/其他Python结构中您可以在解决方案中使用添加的版本更改append:
d_list = []
for x in prj_list:
d = data['Project'][x]['Version']
for el in d:
el['Project'] = x
d_list.append(el)
或使用列表理解:
prj_list = ['AAA', 'BBB']
d_list = [{**el, **{'version': x}} for x in prj_list for el in data['Project'][x]['Version']]
df = pd.DataFrame(d_list)
print (df)
id name description released releaseDate version
0 00001 08.12.2019 null True 2019-08-12 AAA
1 00002 2019.8.26 null True 2019-08-26 AAA
2 00003 AABBY3 2019 Accounting Year End False null BBB
3 00004 AACCZ4 Financial Statements 2019 False null BBB
4 00005 AADDZ5 null False null BBB
您可以在解决方案中使用添加的版本更改附加:
d_list = []
for x in prj_list:
d = data['Project'][x]['Version']
for el in d:
el['Project'] = x
d_list.append(el)
或使用列表理解:
prj_list = ['AAA', 'BBB']
d_list = [{**el, **{'version': x}} for x in prj_list for el in data['Project'][x]['Version']]
df = pd.DataFrame(d_list)
print (df)
id name description released releaseDate version
0 00001 08.12.2019 null True 2019-08-12 AAA
1 00002 2019.8.26 null True 2019-08-26 AAA
2 00003 AABBY3 2019 Accounting Year End False null BBB
3 00004 AACCZ4 Financial Statements 2019 False null BBB
4 00005 AADDZ5 null False null BBB
试试这个:
import json
import pandas as pd
with open("test.json", "r") as read_file:
data = json.load(read_file)['Project']
d_list = []
for name,dat in data.items():
for d in dat['Version']:
d['Project']=name
d_list.append(d)
df = pd.DataFrame(d_list)
print(df)
Project description id name releaseDate released
0 AAA None 00001 08.12.2019 2019-08-12 True
1 AAA None 00002 2019.8.26 2019-08-26 True
2 BBB 2019 Accounting Year End 00003 AABBY3 None False
3 BBB Financial Statements 2019 00004 AACCZ4 None False
4 BBB None 00005 AADDZ5 None False
使用这种方法,您不需要保留单独的项目列表。希望这有帮助 试试这个:
import json
import pandas as pd
with open("test.json", "r") as read_file:
data = json.load(read_file)['Project']
d_list = []
for name,dat in data.items():
for d in dat['Version']:
d['Project']=name
d_list.append(d)
df = pd.DataFrame(d_list)
print(df)
Project description id name releaseDate released
0 AAA None 00001 08.12.2019 2019-08-12 True
1 AAA None 00002 2019.8.26 2019-08-26 True
2 BBB 2019 Accounting Year End 00003 AABBY3 None False
3 BBB Financial Statements 2019 00004 AACCZ4 None False
4 BBB None 00005 AADDZ5 None False
使用这种方法,您不需要保留单独的项目列表。希望这有帮助