Dataframe 使用Dask数据帧展平JSON

Dataframe 使用Dask数据帧展平JSON,dataframe,dask,dask-distributed,dask-delayed,dask-ml,Dataframe,Dask,Dask Distributed,Dask Delayed,Dask Ml,我试图在Dask数据帧中展平JSON数组对象(no files.JSON),因为我有很多数据,并且我的RAM被不断运行的进程占用,所以我需要一个并行形式的解决方案 这就是我的JSON: [ { "id": "0001", "name": "Stiven", "location": [{ "country": "Colombia", "department": "Choco",

我试图在Dask数据帧中展平JSON数组对象(no files.JSON),因为我有很多数据,并且我的RAM被不断运行的进程占用,所以我需要一个并行形式的解决方案

这就是我的JSON:

[ {
        "id": "0001",
        "name": "Stiven",
        "location": [{
                "country": "Colombia",
                "department": "Choco",
                "city": "Quibdo"
            }, {
                "country": "Colombia",
                "department": "Antioquia",
                "city": "Medellin"
            }, {
                "country": "Colombia",
                "department": "Cundinamarca",
                "city": "Bogota"
            }
        ]
    }, {
        "id": "0002",
        "name": "Jhon Jaime",
        "location": [{
                "country": "Colombia",
                "department": "Valle del Cauca",
                "city": "Cali"
            }, {
                "country": "Colombia",
                "department": "Putumayo",
                "city": "Mocoa"
            }, {
                "country": "Colombia",
                "department": "Arauca",
                "city": "Arauca"
            }
        ]
    }, {
        "id": "0003",
        "name": "Francisco",
        "location": [{
                "country": "Colombia",
                "department": "Atlantico",
                "city": "Barranquilla"
            }, {
                "country": "Colombia",
                "department": "Bolivar",
                "city": "Cartagena"
            }, {
                "country": "Colombia",
                "department": "La Guajira",
                "city": "Riohacha"
            }
        ]
    }
]
这就是我的数据帧:

索引id名称位置
0 0001斯蒂文[{'country':'Colombia','department':'Choco','city':'Quibdo'},{'country':'Colombia','department':'Antioquia','city':'Medellin'},{'country':'Colombia','department':'Cundinarmarca','city':'Bogota'}]
1 0002 Jhon Jaime[{'country':'Colombia','department':'Valle del Cauca','city':'Cali',{'country':'Colombia','department':'Putumayo','city':'Mocoa'},{'country':'Colombia','department':'Arauca','city':'Arauca'}]
2 0003旧金山[{'country':'Colombia','department':'Atlantico','city':'Barranquilla'},{'country':'Colombia','city':'Cartagena'},{'country':'Colombia','department':'La Guajira','city':'Riohacha'}]
我需要将每个id转换为dataframe,如下所示:

索引id名称国家部门城市
0 0001斯蒂文哥伦比亚乔科基布多酒店
10001斯蒂文哥伦比亚安蒂奥基亚麦德林
2000斯蒂文哥伦比亚哥伦比亚哥伦比亚首都波哥大
3 0002 Jhon Jaime Colombia Valle del Cauca Cali
40002 Jhon Jaime Colombia Putumayo Mocoa
5 0002 Jhon Jaime Colombia Arauca Arauca
60003弗朗西斯科哥伦比亚大西洋巴兰基拉酒店
7 0003弗朗西斯科哥伦比亚玻利瓦尔卡塔赫纳
8003弗朗西斯科哥伦比亚拉瓜吉拉里奥哈查
所有进程必须与Dask并行。有什么建议吗


提前感谢。

我建议首先使用Pandas数据帧解决此问题,然后使用
.map\u partitions
函数将该函数应用于Dask数据帧内的所有Pandas分区