在Python中解析JSON嵌套数组，保留到JSON对象的映射_Python_Json_Pandas_Jsonparser

在Python中解析JSON嵌套数组，保留到JSON对象的映射

python json pandas

在Python中解析JSON嵌套数组，保留到JSON对象的映射,python,json,pandas,jsonparser,Python,Json,Pandas,Jsonparser,我有一个大JSON文件，其结构如下： { "Project": { "AAA": { "Version": [ { "id": "00001", "name": "08.12.2019", "description": null, "released"

我有一个大JSON文件，其结构如下：

    {
    "Project": {
        "AAA": {
            "Version": [
                {
                    "id": "00001",
                    "name": "08.12.2019",
                    "description": null,
                    "released": true,
                    "releaseDate": "2019-08-12"
                },
                {
                    "id": "00002",
                    "name": "2019.8.26",
                    "description": null,
                    "released": true,
                    "releaseDate": "2019-08-26"
                }
            ]
        },
        "BBB": {
            "Version": [
                {
                    "id": "00003",
                    "name": "AABBY3",
                    "description": "2019 Accounting Year End",
                    "released": false,
                    "releaseDate": null
                },
                {
                    "id": "00004",
                    "name": "AACCZ4",
                    "description": "Financial Statements 2019",
                    "released": false,
                    "releaseDate": null
                },
                {
                    "id": "00005",
                    "name": "AADDZ5",
                    "description": null,
                    "released": false,
                    "releaseDate": null
                }
            ]
        }
    }
}

df.head(3)
Out[10]: 
      description     id    name releaseDate  released
0  Version 5.4.1.  10703  V5R4M1  2010-09-15      True
1   Version 5.5.1  10704  V5R5M1  2015-04-20      True
2   Version 6.1.1  10705  V6R1M1  2016-10-14      True

由于嵌套数组，我在将其转换为Python数据帧时遇到问题。对于每个

项目

，如何提取每个

版本

中的所有数据，但保持对

项目

的引用

到目前为止，我只获得了以下结构的数据帧：

    {
    "Project": {
        "AAA": {
            "Version": [
                {
                    "id": "00001",
                    "name": "08.12.2019",
                    "description": null,
                    "released": true,
                    "releaseDate": "2019-08-12"
                },
                {
                    "id": "00002",
                    "name": "2019.8.26",
                    "description": null,
                    "released": true,
                    "releaseDate": "2019-08-26"
                }
            ]
        },
        "BBB": {
            "Version": [
                {
                    "id": "00003",
                    "name": "AABBY3",
                    "description": "2019 Accounting Year End",
                    "released": false,
                    "releaseDate": null
                },
                {
                    "id": "00004",
                    "name": "AACCZ4",
                    "description": "Financial Statements 2019",
                    "released": false,
                    "releaseDate": null
                },
                {
                    "id": "00005",
                    "name": "AADDZ5",
                    "description": null,
                    "released": false,
                    "releaseDate": null
                }
            ]
        }
    }
}

df.head(3)
Out[10]: 
      description     id    name releaseDate  released
0  Version 5.4.1.  10703  V5R4M1  2010-09-15      True
1   Version 5.5.1  10704  V5R5M1  2015-04-20      True
2   Version 6.1.1  10705  V6R1M1  2016-10-14      True

使用以下命令：

with open("fixVer2.json", "r") as read_file:
    data = json.load(read_file)

prj_list = ['AAA', 'BBB', 'CCC', 'DDD']

d_list = []
for x in prj_list:
    d = data['Project'][x]['Version']
    for el in d:
        d_list.append(el)

df = pd.DataFrame(d_list)

但是，由于不同发布日期的项目之间存在重复的

名称

，我需要保留

项目

名称，以便为每个

名称

识别正确的

发布日期

期望输出：

      description     id    name releaseDate  released  Project
0  Version 5.4.1.  10703  V5R4M1  2010-09-15      True  CCC
1   Version 5.5.1  10704  V5R5M1  2015-04-20      True  CCC
2   Version 6.1.1  10705  V6R1M1  2016-10-14      True  CCC

我不确定如何解析嵌套数组，保留

项目

名称详细信息，并将其整合到一个数据帧/其他Python结构中

您可以在解决方案中使用添加的版本更改append：

d_list = []
for x in prj_list:
    d = data['Project'][x]['Version']
    for el in d:
        el['Project'] = x
        d_list.append(el)

或使用列表理解：

prj_list = ['AAA', 'BBB']

d_list = [{**el, **{'version': x}} for x in prj_list for el in data['Project'][x]['Version']]
df = pd.DataFrame(d_list)
print (df)
      id        name                description  released releaseDate version
0  00001  08.12.2019                       null      True  2019-08-12     AAA
1  00002   2019.8.26                       null      True  2019-08-26     AAA
2  00003      AABBY3   2019 Accounting Year End     False        null     BBB
3  00004      AACCZ4  Financial Statements 2019     False        null     BBB
4  00005      AADDZ5                       null     False        null     BBB

您可以在解决方案中使用添加的版本更改附加：

d_list = []
for x in prj_list:
    d = data['Project'][x]['Version']
    for el in d:
        el['Project'] = x
        d_list.append(el)

或使用列表理解：

prj_list = ['AAA', 'BBB']

d_list = [{**el, **{'version': x}} for x in prj_list for el in data['Project'][x]['Version']]
df = pd.DataFrame(d_list)
print (df)
      id        name                description  released releaseDate version
0  00001  08.12.2019                       null      True  2019-08-12     AAA
1  00002   2019.8.26                       null      True  2019-08-26     AAA
2  00003      AABBY3   2019 Accounting Year End     False        null     BBB
3  00004      AACCZ4  Financial Statements 2019     False        null     BBB
4  00005      AADDZ5                       null     False        null     BBB

试试这个：

import json
import pandas as pd

with open("test.json", "r") as read_file:
    data = json.load(read_file)['Project']
d_list = []
for name,dat in data.items():
    for d in dat['Version']:
        d['Project']=name
        d_list.append(d)
df = pd.DataFrame(d_list)
print(df)

Project                description     id        name releaseDate  released
0     AAA                       None  00001  08.12.2019  2019-08-12      True
1     AAA                       None  00002   2019.8.26  2019-08-26      True
2     BBB   2019 Accounting Year End  00003      AABBY3        None     False
3     BBB  Financial Statements 2019  00004      AACCZ4        None     False
4     BBB                       None  00005      AADDZ5        None     False

使用这种方法，您不需要保留单独的项目列表。希望这有帮助

试试这个：

import json
import pandas as pd

with open("test.json", "r") as read_file:
    data = json.load(read_file)['Project']
d_list = []
for name,dat in data.items():
    for d in dat['Version']:
        d['Project']=name
        d_list.append(d)
df = pd.DataFrame(d_list)
print(df)

Project                description     id        name releaseDate  released
0     AAA                       None  00001  08.12.2019  2019-08-12      True
1     AAA                       None  00002   2019.8.26  2019-08-26      True
2     BBB   2019 Accounting Year End  00003      AABBY3        None     False
3     BBB  Financial Statements 2019  00004      AACCZ4        None     False
4     BBB                       None  00005      AADDZ5        None     False

使用这种方法，您不需要保留单独的项目列表。希望这有帮助