Python中的JSON到长表_Python_Json

Python中的JSON到长表

python json

Python中的JSON到长表,python,json,Python,Json,我在python中将以下JSON定义为dict： specs = { "state/79900116649511": { "active": { "window_size": 10000, "batch": True, "n_col": 50, "n_row"

我在python中将以下JSON定义为dict：

specs = {
    "state/79900116649511": {
        "active": {
            "window_size": 10000,
            "batch": True,
            "n_col": 50,
            "n_row": 200
        },
        "voltan": {
            "window_size": 50
        },
        "cura": {
            "window_size": 100
        }
    },
    "state/79900216649511": {
        "active": {
            "window_size": 10000,
            "batch": True,
            "n_col": 50,
            "n_row": 200
        },
        "voltan": {
            "window_size": 50
        },
        "cura": {
            "window_size": 100
        }
    }
}

我想从这本字典创建一个长表。输出应为：

状态变量窗口大小批处理 n_col 纽罗 79900116649511 积极的 10000 真的 50 200 79900116649511 沃尔坦 50 无效的无效的无效的 79900116649511 库拉 100 无效的无效的无效的 79900216649511 积极的 10000 真的 50 200 79900216649511 沃尔坦 50 无效的无效的无效的 79900216649511 库拉 100 无效的无效的无效的

没有简单的解决方案，因为您使用第一级键作为“state”列，第二级键作为“variable”列，而只有剩余的键作为可能不完整的列值

因此，您需要至少两个具有特殊规则的“循环”来处理这些键

通常，要展平/规范化嵌套的dict/json，可以使用

pandas

库中的helper函数

json\u normalize

。例如，如果您只需要json中的顶级键用作索引，则可以执行以下操作：

import pandas as pd

dfs = []
for key in specs.keys():
    df = pd.json_normalize(specs[key])
    df.index = [key]
    dfs.append(df)
pd.concat(dfs)

其结果是：

此解决方案要么简单，要么有效

import pandas as pd

df = pd.DataFrame.from_dict(specs, orient='index')
df = df.unstack(1).reset_index()
df = df.rename(columns={'level_0': 'variable', 'level_1': 'state'})
columns = df[0].iloc[0].keys()
for i in columns:
    df[i] = df[0].apply(lambda x: x.get(i, None))
    
df = df.drop(columns=0)
df.state = df.state.apply(lambda x: x.replace('state/',''))
df = df[['state', 'variable', 'window_size', 'batch', 'n_col', 'n_row']].sort_values(by=['state'])

结果:

            state variable  window_size batch  n_col  n_row
0  79900116649511   active        10000  True   50.0  200.0
1  79900116649511   voltan           50  None    NaN    NaN
2  79900116649511     cura          100  None    NaN    NaN
3  79900216649511   active        10000  True   50.0  200.0
4  79900216649511   voltan           50  None    NaN    NaN
5  79900216649511     cura          100  None    NaN    NaN

当然，但这不是预期的输出。好吧，如果我为每个“状态/变量”组合定义了所有列，应该可以了，对吗？你能打印输出吗？此外，请包括进口声明。谢谢