Python 将嵌套的json响应规范化为具有非一致密钥的任意嵌套级别的数据帧
我正在努力将JSON响应转换为pandas数据帧,以便用于各种其他操作。我试过上面列出的方法。但问题是我无法有效地使用Python 将嵌套的json响应规范化为具有非一致密钥的任意嵌套级别的数据帧,python,json,pandas,Python,Json,Pandas,我正在努力将JSON响应转换为pandas数据帧,以便用于各种其他操作。我试过上面列出的方法。但问题是我无法有效地使用json\u normalize,因为如果我将所需的键作为record\u path参数传递,则会引发错误,因为只有一些字段具有此键,而不是所有字段。 我不想迭代整个JSON,逐个比较键,然后重新创建自己的dictionary对象。 我想获得带有uuid和nice\u to\u have\u skills,nice\u to\u have\u skills\u path,nice
json\u normalize
,因为如果我将所需的键作为record\u path
参数传递,则会引发错误,因为只有一些字段具有此键,而不是所有字段。
我不想迭代整个JSON,逐个比较键,然后重新创建自己的dictionary对象。
我想获得带有uuid
和nice\u to\u have\u skills
,nice\u to\u have\u skills\u path
,nice\u to\u have\u experience
的数据帧,这些nice\u to\u have
属性可以在json对象中的nice\u to\u have
和操作数
键下找到
下面是一个示例JSON响应
我想在我的数据框中提取这样的“nice_to_have_skill”->[“用户研究”,“线框/原型设计”]
,其中nice_to_have_skill
将是列名,[“用户研究”,“线框/原型设计”]
将是该列中的一个值
编辑:
如果JSON具有任意深度,如何处理它?
例如
{“nice_to_have”:[{“operator”:“AND”,“operators”:[{“operator”:“OR”,
“操作数”:[{“类别”:“语言”,“值”:[{“值”:“韩语”,
“集群”:[]}]}]}],“公司名称”:“框架”,“公司角色”:
[“制造”、“供应链/采购”]}
是JSON的一部分,可以有任何级别的嵌套
将
d['hits']
传递到将导致:
d = json.loads(json_text)
In [136]: %time pd.json_normalize(d['hits'])
CPU times: user 2.1 ms, sys: 41 µs, total: 2.14 ms
Wall time: 2.12 ms
Out[136]:
uuid text_about objectID search_space is_searchspace nice_to_have must_have some key some_key
0 00000000-0000-0000-0000-000000000000 some_text 00000000-0000-0000-0000-000000000000-text_about NaN NaN NaN NaN NaN NaN
1 00000000-0000-0000-0000-000000000000 NaN 00000000-0000-0000-0000-000000000000-search_space some json object True NaN NaN NaN NaN
2 00000000-0000-0000-0000-000000000000 NaN 00000000-0000-0000-0000-000000000000-nice_to_have NaN NaN [{'operator': 'AND', 'operands': [{'category':... NaN NaN NaN
3 00000000-0000-0000-0000-000000000000 NaN 00000000-0000-0000-0000-000000000000-must_have NaN NaN NaN [{'operator': 'AND', 'operands': [{'category':... NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN some json object NaN
5 10000000-0000-0000-0000-000000000001 some text 10000000-0000-0000-0000-000000000001-text_about NaN NaN NaN NaN NaN NaN
6 10000000-0000-0000-0000-000000000001 NaN 10000000-0000-0000-0000-000000000001-search_space some json object True NaN NaN NaN NaN
7 10000000-0000-0000-0000-000000000001 NaN 10000000-0000-0000-0000-000000000001-nice_to_have NaN NaN [{'operator': 'AND', 'operands': [{'category':... NaN NaN NaN
8 10000000-0000-0000-0000-000000000001 NaN 10000000-0000-0000-0000-000000000001-must_have NaN NaN NaN [{'operator': 'AND', 'operands': [{'category':... NaN NaN
9 NaN NaN NaN NaN NaN NaN NaN NaN some json object
在那里,您可以选择要拥有的好东西:
df = pd.json_normalize(d, record_path=['hits'])
In [263]: %time df['nice_to_have'].dropna().sum()
CPU times: user 705 µs, sys: 11 µs, total: 716 µs
Wall time: 713 µs
Out[263]:
[{'operator': 'AND',
'operands': [{'category': 'Skill',
'values': [{'value': 'MySQL ', 'clusters': []}]}]},
{'operator': 'AND',
'operands': [{'category': 'Skill',
'values': [{'value': 'Frontend Programming Language ',
'clusters': [{'key': 'Programming Language~>Frontend Programming Language',
'name': 'Frontend Programming Language',
'path': ['Programming Language', 'Frontend Programming Language'],
'uuid': 'e8c5cc6c-d92b-4098-8965-41e6818fe337',
'category': 'skill',
'pretty_lineage': ['Programming Language']}]}]}]}]
f = list(filter(lambda x: 'nice_to_have' in x, d['hits']))
>> pd.json_normalize(f, ['nice_to_have', 'operands', 'values', 'clusters'])
key name path uuid category pretty_lineage
0 Programming Language~>Frontend Programming Lan... Frontend Programming Language [Programming Language, Frontend Programming La... e8c5cc6c-d92b-4098-8965-41e6818fe337 skill [Programming Language]
希望这有用
编辑:
回应您的评论:此json的主要问题是级别不一致,因此无法执行规范化并引发KeyError
一种解决方法,可以让很好地拥有:
df = pd.json_normalize(d, record_path=['hits'])
In [263]: %time df['nice_to_have'].dropna().sum()
CPU times: user 705 µs, sys: 11 µs, total: 716 µs
Wall time: 713 µs
Out[263]:
[{'operator': 'AND',
'operands': [{'category': 'Skill',
'values': [{'value': 'MySQL ', 'clusters': []}]}]},
{'operator': 'AND',
'operands': [{'category': 'Skill',
'values': [{'value': 'Frontend Programming Language ',
'clusters': [{'key': 'Programming Language~>Frontend Programming Language',
'name': 'Frontend Programming Language',
'path': ['Programming Language', 'Frontend Programming Language'],
'uuid': 'e8c5cc6c-d92b-4098-8965-41e6818fe337',
'category': 'skill',
'pretty_lineage': ['Programming Language']}]}]}]}]
f = list(filter(lambda x: 'nice_to_have' in x, d['hits']))
>> pd.json_normalize(f, ['nice_to_have', 'operands', 'values', 'clusters'])
key name path uuid category pretty_lineage
0 Programming Language~>Frontend Programming Lan... Frontend Programming Language [Programming Language, Frontend Programming La... e8c5cc6c-d92b-4098-8965-41e6818fe337 skill [Programming Language]
从那里你可以得到你想要的值。类似的解决方法也可以应用于获取必须具备的是的,我发现,在问题中也应该提到。但是,有没有办法在json规范化中将nice_传递给_have作为根,这样我也可以取消对该字段的检测?哦,使用lambda和filter的好例子!谢谢