Json 如何从嵌套字典（多个级别）创建多索引数据帧_Json_Python 3.x_Pandas_Dictionary_Multi Index

Json 如何从嵌套字典（多个级别）创建多索引数据帧

json python-3.x pandas dictionary

Json 如何从嵌套字典（多个级别）创建多索引数据帧,json,python-3.x,pandas,dictionary,multi-index,Json,Python 3.x,Pandas,Dictionary,Multi Index,我正在使用pyflightdata库搜索飞行统计数据。它在dict列表中返回json 下面是我查询后列表中第一个字典的示例： > flightlog = {'identification': {'number': {'default': 'KE504', 'alternative': 'None'}, 'callsign': 'KAL504', 'codeshare': 'None'} , 'status': {'live': False, 'text': 'Landed 22:29',

我正在使用

pyflightdata

库搜索飞行统计数据。它在dict列表中返回json

下面是我查询后列表中第一个字典的示例：

> flightlog = {'identification': {'number': {'default': 'KE504', 'alternative': 'None'}, 'callsign': 'KAL504', 'codeshare': 'None'}
, 'status': {'live': False, 'text': 'Landed 22:29', 'estimated': 'None', 'ambiguous': False, 'generic': {'status': {'text': 'landed', 'type': 'arrival', 'color': 'green', 'diverted': 'None'}
, 'eventTime': {'utc_millis': 1604611778000, 'utc_date': '20201105', 'utc_time': '2229', 'utc': 1604611778, 'local_millis': 1604615378000, 'local_date': '20201105', 'local_time': '2329', 'local': 1604615378}}}
, 'aircraft': {'model': {'code': 'B77L', 'text': 'Boeing 777-FEZ'}, 'registration': 'HL8075', 'country': {'name': 'South Korea', 'alpha2': 'KR', 'alpha3': 'KOR'}}
, 'airline': {'name': 'Korean Air', 'code': {'iata': 'KE', 'icao': 'KAL'}}
, 'airport': {'origin': {'name': 'London Heathrow Airport', 'code': {'iata': 'LHR', 'icao': 'EGLL'}, 'position': {'latitude': 51.471626, 'longitude': -0.467081, 'country': {'name': 'United Kingdom', 'code': 'GB'}, 'region': {'city': 'London'}}
, 'timezone': {'name': 'Europe/London', 'offset': 0, 'abbr': 'GMT', 'abbrName': 'Greenwich Mean Time', 'isDst': False}}, 'destination': {'name': 'Paris Charles de Gaulle Airport', 'code': {'iata': 'CDG', 'icao': 'LFPG'}, 'position': {'latitude': 49.012516, 'longitude': 2.555752, 'country': {'name': 'France', 'code': 'FR'}, 'region': {'city': 'Paris'}}, 'timezone': {'name': 'Europe/Paris', 'offset': 3600, 'abbr': 'CET', 'abbrName': 'Central European Time', 'isDst': False}}, 'real': 'None'}
, 'time': {'scheduled': {'departure_millis': 1604607300000, 'departure_date': '20201105', 'departure_time': '2115', 'departure': 1604607300, 'arrival_millis': 1604612700000, 'arrival_date': '20201105', 'arrival_time': '2245', 'arrival': 1604612700}, 'real': {'departure_millis': 1604609079000, 'departure_date': '20201105', 'departure_time': '2144', 'departure': 1604609079, 'arrival_millis': 1604611778000, 'arrival_date': '20201105', 'arrival_time': '2229', 'arrival': 1604611778}, 'estimated': {'departure': 'None', 'arrival': 'None'}, 'other': {'eta_millis': 1604611778000, 'eta_date': '20201105', 'eta_time': '2229', 'eta': 1604611778}}}

这本字典是一个庞大的、多嵌套的json乱七八糟的东西，我正在努力寻找一种让它可读的方法。我猜是这样的：

 identification     number      default                 KE504
                                alternative             None
                    callsign                            KAL504
                    codeshare                           None

 status             live                                False
                    text                                Landed 22:29
                    Estimated                           None
                    ambiguous                           False
...

我试图将其转换为一个数据帧，结果好坏参半

其中解释了多索引值必须是元组，而不是字典，因此我使用了它们的示例来转换我的字典：

> flightlog_tuple = {(outerKey, innerKey): values for outerKey, innerDict in flightlog.items() for innerKey, values in innerDict.items()}

这在一定程度上起了作用

df2 = pd.Series(flightlog_tuple)

提供以下输出：

identification  number                {'default': 'KE504', 'alternative': 'None'}
                callsign                                                   KAL504
                codeshare                                                    None
status          live                                                        False
                text                                                 Landed 22:29
                estimated                                                    None
                ambiguous                                                   False
                generic         {'status': {'text': 'landed', 'type': 'arrival...
aircraft        model                  {'code': 'B77L', 'text': 'Boeing 777-FEZ'}
                registration                                               HL8075
                country         {'name': 'South Korea', 'alpha2': 'KR', 'alpha...
airline         name                                                   Korean Air
                code                                {'iata': 'KE', 'icao': 'KAL'}
airport         origin          {'name': 'London Heathrow Airport', 'code': {'...
                destination     {'name': 'Paris Charles de Gaulle Airport', 'c...
                real                                                         None
time            scheduled       {'departure_millis': 1604607300000, 'departure...
                real            {'departure_millis': 1604609079000, 'departure...
                estimated                {'departure': 'None', 'arrival': 'None'}
                other           {'eta_millis': 1604611778000, 'eta_date': '202...
dtype: object

这就是我想要的，但是有些索引仍然在列中，因为有很多级别。因此，我遵循并尝试添加更多级别：

level_up = {(level1Key, level2Key, level3Key): values for level1Key, level2Dict in flightlog.items() for level2Key, level3Dict in level2Dict.items() for level3Key, values in level3Dict.items()}
df2 = pd.Series(level_up)

此代码为我提供了

AttributeError:'str'对象没有属性'items'

。我不明白为什么前两个索引有效，但其他索引给出了一个错误

我尝试过其他方法，如MultiIndex.from_tuple或DataFrame.from_dict，但我无法让它工作

这本词典对初学者来说太复杂了。我不知道什么是正确的方法。也许我用错了数据帧。也许有一种更简单的方法来访问我忽略的数据

任何帮助都将不胜感激

您在一个json中有许多

表

，您需要从逻辑上将它们拆分出来，并使用相关键将它们相互关联起来，这样做将是一个很好的起点

dfs={k:pd.json\u在flightlog.items（）中对k，v进行规范化（v）}

我刚刚意识到pyflightdata API以单引号返回数据，因此与他们的文档相反，它不是JSON。json_normalize部分工作，但数据仍然无法读取。