Json 将dict的dict转换为数据帧

Json 将dict的dict转换为数据帧,json,pandas,dataframe,dictionary,Json,Pandas,Dataframe,Dictionary,我有一个稍微复杂的json,需要将其转换为数据帧。这是另一个API的标准输出json,因此字段名不会更改 我有下面这篇文章,它比我到目前为止的工作要复杂得多 >>> import pandas as pd >>> data = [{'annotation_spec': {'description': 'Story_Driven', ... 'display_name': 'Story_Driven'}, ... 'segments': [{'conf

我有一个稍微复杂的json,需要将其转换为数据帧。这是另一个API的标准输出json,因此字段名不会更改

我有下面这篇文章,它比我到目前为止的工作要复杂得多

>>> import pandas as pd
>>> data = [{'annotation_spec': {'description': 'Story_Driven',
...    'display_name': 'Story_Driven'},
...   'segments': [{'confidence': 0.52302074,
...     'segment': {'end_time_offset': {'nanos': 973306000, 'seconds': 14},
...      'start_time_offset': {}}}]},
...  {'annotation_spec': {'description': 'real', 'display_name': 'real'},
...   'segments': [{'confidence': 0.5244379,
...     'segment': {'end_time_offset': {'nanos': 973306000, 'seconds': 14},
...      'start_time_offset': {}}}]}]

我浏览了所有相关的文章,我能得到的最接近数据框的是

from pandas.io.json import json_normalize
pd.DataFrame.from_dict(json_normalize(data,record_path= 
['segments'],meta=[['annotation_spec','description'], 
['annotation_spec','display_name']],errors='ignore'))
这给了我这样的输出

>>> from pandas.io.json import json_normalize
>>> pd.DataFrame.from_dict(json_normalize(data,record_path=['segments'],meta=[['annotation_spec','description'],['annotation_spec','display_name']],errors='ignore'))
   confidence                                            segment annotation_spec.description annotation_spec.display_name
0    0.523021  {u'end_time_offset': {u'nanos': 973306000, u's...                Story_Driven                 Story_Driven
1    0.524438  {u'end_time_offset': {u'nanos': 973306000, u's...                        real                         real
>>>

我想把上面的“段”列分解成它的组件。我该怎么做?

基本上
json\u normalize
处理嵌套的dict,这里有一个问题,因为segments键中的列表

因此,如果列表的长度始终为1,我们可以删除该列表,然后应用
json\u normalize

### function to remove the lsit, we basically check if its a list, if so just take the first element
remove_list = lambda dct:{k:(v[0] if type(v)==list else v) for k,v in dct.items()}

data_clean = [remove_list(entry) for entry in data]

json_normalize(data_clean, sep="__")