Json 将dict的dict转换为数据帧
我有一个稍微复杂的json,需要将其转换为数据帧。这是另一个API的标准输出json,因此字段名不会更改 我有下面这篇文章,它比我到目前为止的工作要复杂得多Json 将dict的dict转换为数据帧,json,pandas,dataframe,dictionary,Json,Pandas,Dataframe,Dictionary,我有一个稍微复杂的json,需要将其转换为数据帧。这是另一个API的标准输出json,因此字段名不会更改 我有下面这篇文章,它比我到目前为止的工作要复杂得多 >>> import pandas as pd >>> data = [{'annotation_spec': {'description': 'Story_Driven', ... 'display_name': 'Story_Driven'}, ... 'segments': [{'conf
>>> import pandas as pd
>>> data = [{'annotation_spec': {'description': 'Story_Driven',
... 'display_name': 'Story_Driven'},
... 'segments': [{'confidence': 0.52302074,
... 'segment': {'end_time_offset': {'nanos': 973306000, 'seconds': 14},
... 'start_time_offset': {}}}]},
... {'annotation_spec': {'description': 'real', 'display_name': 'real'},
... 'segments': [{'confidence': 0.5244379,
... 'segment': {'end_time_offset': {'nanos': 973306000, 'seconds': 14},
... 'start_time_offset': {}}}]}]
我浏览了所有相关的文章,我能得到的最接近数据框的是
from pandas.io.json import json_normalize
pd.DataFrame.from_dict(json_normalize(data,record_path=
['segments'],meta=[['annotation_spec','description'],
['annotation_spec','display_name']],errors='ignore'))
这给了我这样的输出
>>> from pandas.io.json import json_normalize
>>> pd.DataFrame.from_dict(json_normalize(data,record_path=['segments'],meta=[['annotation_spec','description'],['annotation_spec','display_name']],errors='ignore'))
confidence segment annotation_spec.description annotation_spec.display_name
0 0.523021 {u'end_time_offset': {u'nanos': 973306000, u's... Story_Driven Story_Driven
1 0.524438 {u'end_time_offset': {u'nanos': 973306000, u's... real real
>>>
我想把上面的“段”列分解成它的组件。我该怎么做?基本上
json\u normalize
处理嵌套的dict,这里有一个问题,因为segments键中的列表
因此,如果列表的长度始终为1,我们可以删除该列表,然后应用json\u normalize
### function to remove the lsit, we basically check if its a list, if so just take the first element
remove_list = lambda dct:{k:(v[0] if type(v)==list else v) for k,v in dct.items()}
data_clean = [remove_list(entry) for entry in data]
json_normalize(data_clean, sep="__")