Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/282.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将嵌套的json响应规范化为具有非一致密钥的任意嵌套级别的数据帧_Python_Json_Pandas - Fatal编程技术网

Python 将嵌套的json响应规范化为具有非一致密钥的任意嵌套级别的数据帧

Python 将嵌套的json响应规范化为具有非一致密钥的任意嵌套级别的数据帧,python,json,pandas,Python,Json,Pandas,我正在努力将JSON响应转换为pandas数据帧,以便用于各种其他操作。我试过上面列出的方法。但问题是我无法有效地使用json\u normalize,因为如果我将所需的键作为record\u path参数传递,则会引发错误,因为只有一些字段具有此键,而不是所有字段。 我不想迭代整个JSON,逐个比较键,然后重新创建自己的dictionary对象。 我想获得带有uuid和nice\u to\u have\u skills,nice\u to\u have\u skills\u path,nice

我正在努力将JSON响应转换为pandas数据帧,以便用于各种其他操作。我试过上面列出的方法。但问题是我无法有效地使用
json\u normalize
,因为如果我将所需的键作为
record\u path
参数传递,则会引发错误,因为只有一些字段具有此键,而不是所有字段。 我不想迭代整个JSON,逐个比较键,然后重新创建自己的dictionary对象。 我想获得带有
uuid
nice\u to\u have\u skills
nice\u to\u have\u skills\u path
nice\u to\u have\u experience
的数据帧,这些
nice\u to\u have
属性可以在json对象中的
nice\u to\u have
操作数
键下找到

下面是一个示例JSON响应

我想在我的数据框中提取这样的
“nice_to_have_skill”->[“用户研究”,“线框/原型设计”]
,其中
nice_to_have_skill
将是列名,
[“用户研究”,“线框/原型设计”]
将是该列中的一个值

编辑: 如果JSON具有任意深度,如何处理它? 例如

{“nice_to_have”:[{“operator”:“AND”,“operators”:[{“operator”:“OR”, “操作数”:[{“类别”:“语言”,“值”:[{“值”:“韩语”, “集群”:[]}]}]}],“公司名称”:“框架”,“公司角色”: [“制造”、“供应链/采购”]} 是JSON的一部分,可以有任何级别的嵌套


d['hits']
传递到将导致:

d = json.loads(json_text)

In [136]: %time pd.json_normalize(d['hits'])                                                                                                                                                                                                                                       
CPU times: user 2.1 ms, sys: 41 µs, total: 2.14 ms
Wall time: 2.12 ms
Out[136]: 
                                   uuid text_about                                           objectID      search_space is_searchspace                                       nice_to_have                                          must_have          some key          some_key
0  00000000-0000-0000-0000-000000000000  some_text    00000000-0000-0000-0000-000000000000-text_about               NaN            NaN                                                NaN                                                NaN               NaN               NaN
1  00000000-0000-0000-0000-000000000000        NaN  00000000-0000-0000-0000-000000000000-search_space  some json object           True                                                NaN                                                NaN               NaN               NaN
2  00000000-0000-0000-0000-000000000000        NaN  00000000-0000-0000-0000-000000000000-nice_to_have               NaN            NaN  [{'operator': 'AND', 'operands': [{'category':...                                                NaN               NaN               NaN
3  00000000-0000-0000-0000-000000000000        NaN     00000000-0000-0000-0000-000000000000-must_have               NaN            NaN                                                NaN  [{'operator': 'AND', 'operands': [{'category':...               NaN               NaN
4                                   NaN        NaN                                                NaN               NaN            NaN                                                NaN                                                NaN  some json object               NaN
5  10000000-0000-0000-0000-000000000001  some text    10000000-0000-0000-0000-000000000001-text_about               NaN            NaN                                                NaN                                                NaN               NaN               NaN
6  10000000-0000-0000-0000-000000000001        NaN  10000000-0000-0000-0000-000000000001-search_space  some json object           True                                                NaN                                                NaN               NaN               NaN
7  10000000-0000-0000-0000-000000000001        NaN  10000000-0000-0000-0000-000000000001-nice_to_have               NaN            NaN  [{'operator': 'AND', 'operands': [{'category':...                                                NaN               NaN               NaN
8  10000000-0000-0000-0000-000000000001        NaN     10000000-0000-0000-0000-000000000001-must_have               NaN            NaN                                                NaN  [{'operator': 'AND', 'operands': [{'category':...               NaN               NaN
9                                   NaN        NaN                                                NaN               NaN            NaN                                                NaN                                                NaN               NaN  some json object
在那里,您可以选择要拥有的好东西:

df = pd.json_normalize(d, record_path=['hits'])

In [263]: %time df['nice_to_have'].dropna().sum()                                                                                                                                                                                                                                  
CPU times: user 705 µs, sys: 11 µs, total: 716 µs
Wall time: 713 µs
Out[263]: 
[{'operator': 'AND',
  'operands': [{'category': 'Skill',
    'values': [{'value': 'MySQL ', 'clusters': []}]}]},
 {'operator': 'AND',
  'operands': [{'category': 'Skill',
    'values': [{'value': 'Frontend Programming Language ',
      'clusters': [{'key': 'Programming Language~>Frontend Programming Language',
        'name': 'Frontend Programming Language',
        'path': ['Programming Language', 'Frontend Programming Language'],
        'uuid': 'e8c5cc6c-d92b-4098-8965-41e6818fe337',
        'category': 'skill',
        'pretty_lineage': ['Programming Language']}]}]}]}]
f = list(filter(lambda x: 'nice_to_have' in x, d['hits']))  

>> pd.json_normalize(f, ['nice_to_have', 'operands', 'values', 'clusters'])

                                                 key                           name                                               path                                  uuid category          pretty_lineage
0  Programming Language~>Frontend Programming Lan...  Frontend Programming Language  [Programming Language, Frontend Programming La...  e8c5cc6c-d92b-4098-8965-41e6818fe337    skill  [Programming Language]
希望这有用

编辑:

回应您的评论:此json的主要问题是级别不一致,因此无法执行规范化并引发KeyError

一种解决方法,可以让
很好地拥有

df = pd.json_normalize(d, record_path=['hits'])

In [263]: %time df['nice_to_have'].dropna().sum()                                                                                                                                                                                                                                  
CPU times: user 705 µs, sys: 11 µs, total: 716 µs
Wall time: 713 µs
Out[263]: 
[{'operator': 'AND',
  'operands': [{'category': 'Skill',
    'values': [{'value': 'MySQL ', 'clusters': []}]}]},
 {'operator': 'AND',
  'operands': [{'category': 'Skill',
    'values': [{'value': 'Frontend Programming Language ',
      'clusters': [{'key': 'Programming Language~>Frontend Programming Language',
        'name': 'Frontend Programming Language',
        'path': ['Programming Language', 'Frontend Programming Language'],
        'uuid': 'e8c5cc6c-d92b-4098-8965-41e6818fe337',
        'category': 'skill',
        'pretty_lineage': ['Programming Language']}]}]}]}]
f = list(filter(lambda x: 'nice_to_have' in x, d['hits']))  

>> pd.json_normalize(f, ['nice_to_have', 'operands', 'values', 'clusters'])

                                                 key                           name                                               path                                  uuid category          pretty_lineage
0  Programming Language~>Frontend Programming Lan...  Frontend Programming Language  [Programming Language, Frontend Programming La...  e8c5cc6c-d92b-4098-8965-41e6818fe337    skill  [Programming Language]

从那里你可以得到你想要的值。类似的解决方法也可以应用于获取
必须具备的

是的,我发现,在问题中也应该提到。但是,有没有办法在json规范化中将nice_传递给_have作为根,这样我也可以取消对该字段的检测?哦,使用lambda和filter的好例子!谢谢