Python 将嵌套的json响应规范化为具有非一致密钥的任意嵌套级别的数据帧_Python_Json_Pandas

Python 将嵌套的json响应规范化为具有非一致密钥的任意嵌套级别的数据帧

python json pandas

Python 将嵌套的json响应规范化为具有非一致密钥的任意嵌套级别的数据帧,python,json,pandas,Python,Json,Pandas,我正在努力将JSON响应转换为pandas数据帧，以便用于各种其他操作。我试过上面列出的方法。但问题是我无法有效地使用json\u normalize，因为如果我将所需的键作为record\u path参数传递，则会引发错误，因为只有一些字段具有此键，而不是所有字段。我不想迭代整个JSON，逐个比较键，然后重新创建自己的dictionary对象。我想获得带有uuid和nice\u to\u have\u skills，nice\u to\u have\u skills\u path，nice

我正在努力将JSON响应转换为pandas数据帧，以便用于各种其他操作。我试过上面列出的方法。但问题是我无法有效地使用

json\u normalize

，因为如果我将所需的键作为

record\u path

参数传递，则会引发错误，因为只有一些字段具有此键，而不是所有字段。我不想迭代整个JSON，逐个比较键，然后重新创建自己的dictionary对象。我想获得带有

uuid

和

nice\u to\u have\u skills

，

nice\u to\u have\u skills\u path

，

nice\u to\u have\u experience

的数据帧，这些

nice\u to\u have

属性可以在json对象中的

nice\u to\u have

和

操作数

键下找到

下面是一个示例JSON响应

我想在我的数据框中提取这样的

“nice_to_have_skill”->[“用户研究”，“线框/原型设计”]

，其中

nice_to_have_skill

将是列名，

[“用户研究”，“线框/原型设计”]

将是该列中的一个值

编辑：如果JSON具有任意深度，如何处理它？例如

{“nice_to_have”：[{“operator”：“AND”，“operators”：[{“operator”：“OR”， “操作数”：[{“类别”：“语言”，“值”：[{“值”：“韩语”， “集群”：[]}]}]}]，“公司名称”：“框架”，“公司角色”： [“制造”、“供应链/采购”]} 是JSON的一部分，可以有任何级别的嵌套

将

d['hits']

传递到将导致：

d = json.loads(json_text)

In [136]: %time pd.json_normalize(d['hits'])                                                                                                                                                                                                                                       
CPU times: user 2.1 ms, sys: 41 µs, total: 2.14 ms
Wall time: 2.12 ms
Out[136]: 
                                   uuid text_about                                           objectID      search_space is_searchspace                                       nice_to_have                                          must_have          some key          some_key
0  00000000-0000-0000-0000-000000000000  some_text    00000000-0000-0000-0000-000000000000-text_about               NaN            NaN                                                NaN                                                NaN               NaN               NaN
1  00000000-0000-0000-0000-000000000000        NaN  00000000-0000-0000-0000-000000000000-search_space  some json object           True                                                NaN                                                NaN               NaN               NaN
2  00000000-0000-0000-0000-000000000000        NaN  00000000-0000-0000-0000-000000000000-nice_to_have               NaN            NaN  [{'operator': 'AND', 'operands': [{'category':...                                                NaN               NaN               NaN
3  00000000-0000-0000-0000-000000000000        NaN     00000000-0000-0000-0000-000000000000-must_have               NaN            NaN                                                NaN  [{'operator': 'AND', 'operands': [{'category':...               NaN               NaN
4                                   NaN        NaN                                                NaN               NaN            NaN                                                NaN                                                NaN  some json object               NaN
5  10000000-0000-0000-0000-000000000001  some text    10000000-0000-0000-0000-000000000001-text_about               NaN            NaN                                                NaN                                                NaN               NaN               NaN
6  10000000-0000-0000-0000-000000000001        NaN  10000000-0000-0000-0000-000000000001-search_space  some json object           True                                                NaN                                                NaN               NaN               NaN
7  10000000-0000-0000-0000-000000000001        NaN  10000000-0000-0000-0000-000000000001-nice_to_have               NaN            NaN  [{'operator': 'AND', 'operands': [{'category':...                                                NaN               NaN               NaN
8  10000000-0000-0000-0000-000000000001        NaN     10000000-0000-0000-0000-000000000001-must_have               NaN            NaN                                                NaN  [{'operator': 'AND', 'operands': [{'category':...               NaN               NaN
9                                   NaN        NaN                                                NaN               NaN            NaN                                                NaN                                                NaN               NaN  some json object

在那里，您可以选择要拥有的好东西：

df = pd.json_normalize(d, record_path=['hits'])

In [263]: %time df['nice_to_have'].dropna().sum()                                                                                                                                                                                                                                  
CPU times: user 705 µs, sys: 11 µs, total: 716 µs
Wall time: 713 µs
Out[263]: 
[{'operator': 'AND',
  'operands': [{'category': 'Skill',
    'values': [{'value': 'MySQL ', 'clusters': []}]}]},
 {'operator': 'AND',
  'operands': [{'category': 'Skill',
    'values': [{'value': 'Frontend Programming Language ',
      'clusters': [{'key': 'Programming Language~>Frontend Programming Language',
        'name': 'Frontend Programming Language',
        'path': ['Programming Language', 'Frontend Programming Language'],
        'uuid': 'e8c5cc6c-d92b-4098-8965-41e6818fe337',
        'category': 'skill',
        'pretty_lineage': ['Programming Language']}]}]}]}]

f = list(filter(lambda x: 'nice_to_have' in x, d['hits']))  

>> pd.json_normalize(f, ['nice_to_have', 'operands', 'values', 'clusters'])

                                                 key                           name                                               path                                  uuid category          pretty_lineage
0  Programming Language~>Frontend Programming Lan...  Frontend Programming Language  [Programming Language, Frontend Programming La...  e8c5cc6c-d92b-4098-8965-41e6818fe337    skill  [Programming Language]

希望这有用

编辑：

回应您的评论：此json的主要问题是级别不一致，因此无法执行规范化并引发KeyError

一种解决方法，可以让

很好地拥有：
df = pd.json_normalize(d, record_path=['hits'])

In [263]: %time df['nice_to_have'].dropna().sum()                                                                                                                                                                                                                                  
CPU times: user 705 µs, sys: 11 µs, total: 716 µs
Wall time: 713 µs
Out[263]: 
[{'operator': 'AND',
  'operands': [{'category': 'Skill',
    'values': [{'value': 'MySQL ', 'clusters': []}]}]},
 {'operator': 'AND',
  'operands': [{'category': 'Skill',
    'values': [{'value': 'Frontend Programming Language ',
      'clusters': [{'key': 'Programming Language~>Frontend Programming Language',
        'name': 'Frontend Programming Language',
        'path': ['Programming Language', 'Frontend Programming Language'],
        'uuid': 'e8c5cc6c-d92b-4098-8965-41e6818fe337',
        'category': 'skill',
        'pretty_lineage': ['Programming Language']}]}]}]}]

f = list(filter(lambda x: 'nice_to_have' in x, d['hits']))  

>> pd.json_normalize(f, ['nice_to_have', 'operands', 'values', 'clusters'])

                                                 key                           name                                               path                                  uuid category          pretty_lineage
0  Programming Language~>Frontend Programming Lan...  Frontend Programming Language  [Programming Language, Frontend Programming La...  e8c5cc6c-d92b-4098-8965-41e6818fe337    skill  [Programming Language]

从那里你可以得到你想要的值。类似的解决方法也可以应用于获取必须具备的
是的，我发现，在问题中也应该提到。但是，有没有办法在json规范化中将nice_传递给_have作为根，这样我也可以取消对该字段的检测？哦，使用lambda和filter的好例子！谢谢