Python 如何将具有不同列表(作为字典值)的嵌套json结构转换为dataframe

Python 如何将具有不同列表(作为字典值)的嵌套json结构转换为dataframe,python,pandas,Python,Pandas,我将JSON转换为DataFrame,最后得到一列“Structure_value”,其中包含以下值作为字典列表: Structure_value [{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}] [{'Room': [6], 'Length': 22}] [{'Room': [6,6], 'Length': 8}] 我需要将其分为以下四列: 结构\价值\房间\ 1 结构值长度1 结构\价值\

我将JSON转换为DataFrame,最后得到一列“Structure_value”,其中包含以下值作为字典列表:

                   Structure_value
[{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}]
[{'Room': [6], 'Length': 22}]
[{'Room': [6,6], 'Length': 8}]
我需要将其分为以下四列:

结构\价值\房间\ 1 结构值长度1 结构\价值\房间\ 2 结构值长度2

其输出应如下所示:

   Structure_value_room_1  Structure_value_length_1  Structure_value_room_2  \
0                       6                         7                     6.0   
1                       6                        22                     NaN   
2                       6                         8                     6.0   

   Structure_value_length_2  
0                       7.0  
1                       NaN  
2                       8.0  
如何处理这样的情况:一个属性在一个列表中有多个值,我们需要将它们拆分为其他列


p.S.:我能够处理这样的情况:
[{'Room':[6],'Length':7},{'Room':[6],'Length':7}]
但我无法处理这种情况
[{'Room':[6,6],'Length':8}]
如果我们谈论这种特殊的数据结构,我希望这会有所帮助

源数据 规范化
您说过这种数据结构适合您进行后续处理。

我无法将您的结构值表示作为json文件处理,我不知道它们是否表示许多单个文件。 我使用[{'Room':[6],'Length':7},{'Room':[6],'Length':7}]作为文件1,[{'Room':[6],'Length':22}]作为文件2,[{'Room':[6,6],'Length':8}]作为文件3

#treat the irregular structures
def process_structure(s):

    specs = []

    for label,quantity in s.items():

        if isinstance(quantity,list):       
            specs.append(label)
            for elem in quantity:
                specs.append(elem)          
        elif isinstance(quantity,int):
            specs.append(label)
            specs.append(quantity)

    return specs

#open and treat jsons
def treat_json(file):

    with open(file, 'r') as f:

        dicts   = {}
        to_df   = []
        load_df = []

        valRoom = 0
        valLen  = 0

        structures = json.load(f)

        for dicts in structures:

            to_df = process_structure(dicts)
            long  = len(to_df) 

            for i in range(0,long):

                if to_df[i] == 'Room':
                    valRoom = to_df[i+1]
                    load_df.append(valRoom)
                elif to_df[i] == 'Length':
                    valLen = to_df[i+1]
                    load_df.append(valLen)
                elif isinstance(to_df[i],int) and i < (long - 1):
                    if isinstance(to_df[i+1],int):
                        load_df.append(to_df[i+1])
                        load_df.append(valLen)#repeat Length

        while len(load_df) < 4: #if its no complete
            load_df.append(None)

        df_temp = pd.DataFrame([load_df],columns=['Structure_value_room_1','Structure_value_length_1','Structure_value_room_2','Structure_value_length_2'])

    return df_temp
df['tmp'] = df['Structure_value'].apply(lambda x: [{'Room':[v], 'Length': x[0]['Length']} for v in x[0]['Room']] if ((len(x) == 1) & (type(x[0]['Room'])==list)) else x)
pd.DataFrame(df['tmp'].values.tolist())

Out[2]:

     0                            1
0   {'Room': [6], 'Length': 7}    {'Room': [6], 'Length': 7}
1   {'Room': [6], 'Length': 22}   None
2   {'Room': [6], 'Length': 8}    {'Room': [6], 'Length': 8}
#treat the irregular structures
def process_structure(s):

    specs = []

    for label,quantity in s.items():

        if isinstance(quantity,list):       
            specs.append(label)
            for elem in quantity:
                specs.append(elem)          
        elif isinstance(quantity,int):
            specs.append(label)
            specs.append(quantity)

    return specs

#open and treat jsons
def treat_json(file):

    with open(file, 'r') as f:

        dicts   = {}
        to_df   = []
        load_df = []

        valRoom = 0
        valLen  = 0

        structures = json.load(f)

        for dicts in structures:

            to_df = process_structure(dicts)
            long  = len(to_df) 

            for i in range(0,long):

                if to_df[i] == 'Room':
                    valRoom = to_df[i+1]
                    load_df.append(valRoom)
                elif to_df[i] == 'Length':
                    valLen = to_df[i+1]
                    load_df.append(valLen)
                elif isinstance(to_df[i],int) and i < (long - 1):
                    if isinstance(to_df[i+1],int):
                        load_df.append(to_df[i+1])
                        load_df.append(valLen)#repeat Length

        while len(load_df) < 4: #if its no complete
            load_df.append(None)

        df_temp = pd.DataFrame([load_df],columns=['Structure_value_room_1','Structure_value_length_1','Structure_value_room_2','Structure_value_length_2'])

    return df_temp
treat_json('house3.json')
    Structure_value_room_1  ...  Structure_value_length_2
0                       6  ...                         8

[1 rows x 4 columns]

treat_json('house2.json')
    Structure_value_room_1  ...  Structure_value_length_2
0                       6  ...                      None

[1 rows x 4 columns]

treat_json('house1.json')

    Structure_value_room_1  ...  Structure_value_length_2
0                       6  ...                         7

[1 rows x 4 columns]