Python 如何将具有不同列表(作为字典值)的嵌套json结构转换为dataframe
我将JSON转换为DataFrame,最后得到一列“Structure_value”,其中包含以下值作为字典列表:Python 如何将具有不同列表(作为字典值)的嵌套json结构转换为dataframe,python,pandas,Python,Pandas,我将JSON转换为DataFrame,最后得到一列“Structure_value”,其中包含以下值作为字典列表: Structure_value [{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}] [{'Room': [6], 'Length': 22}] [{'Room': [6,6], 'Length': 8}] 我需要将其分为以下四列: 结构\价值\房间\ 1 结构值长度1 结构\价值\
Structure_value
[{'Room': [6], 'Length': 7}, {'Room': [6], 'Length': 7}]
[{'Room': [6], 'Length': 22}]
[{'Room': [6,6], 'Length': 8}]
我需要将其分为以下四列:
结构\价值\房间\ 1
结构值长度1
结构\价值\房间\ 2
结构值长度2
其输出应如下所示:
Structure_value_room_1 Structure_value_length_1 Structure_value_room_2 \
0 6 7 6.0
1 6 22 NaN
2 6 8 6.0
Structure_value_length_2
0 7.0
1 NaN
2 8.0
如何处理这样的情况:一个属性在一个列表中有多个值,我们需要将它们拆分为其他列
p.S.:我能够处理这样的情况:
[{'Room':[6],'Length':7},{'Room':[6],'Length':7}]
但我无法处理这种情况[{'Room':[6,6],'Length':8}]
如果我们谈论这种特殊的数据结构,我希望这会有所帮助
源数据
规范化
您说过这种数据结构适合您进行后续处理。我无法将您的结构值表示作为json文件处理,我不知道它们是否表示许多单个文件。 我使用[{'Room':[6],'Length':7},{'Room':[6],'Length':7}]作为文件1,[{'Room':[6],'Length':22}]作为文件2,[{'Room':[6,6],'Length':8}]作为文件3
#treat the irregular structures
def process_structure(s):
specs = []
for label,quantity in s.items():
if isinstance(quantity,list):
specs.append(label)
for elem in quantity:
specs.append(elem)
elif isinstance(quantity,int):
specs.append(label)
specs.append(quantity)
return specs
#open and treat jsons
def treat_json(file):
with open(file, 'r') as f:
dicts = {}
to_df = []
load_df = []
valRoom = 0
valLen = 0
structures = json.load(f)
for dicts in structures:
to_df = process_structure(dicts)
long = len(to_df)
for i in range(0,long):
if to_df[i] == 'Room':
valRoom = to_df[i+1]
load_df.append(valRoom)
elif to_df[i] == 'Length':
valLen = to_df[i+1]
load_df.append(valLen)
elif isinstance(to_df[i],int) and i < (long - 1):
if isinstance(to_df[i+1],int):
load_df.append(to_df[i+1])
load_df.append(valLen)#repeat Length
while len(load_df) < 4: #if its no complete
load_df.append(None)
df_temp = pd.DataFrame([load_df],columns=['Structure_value_room_1','Structure_value_length_1','Structure_value_room_2','Structure_value_length_2'])
return df_temp
df['tmp'] = df['Structure_value'].apply(lambda x: [{'Room':[v], 'Length': x[0]['Length']} for v in x[0]['Room']] if ((len(x) == 1) & (type(x[0]['Room'])==list)) else x)
pd.DataFrame(df['tmp'].values.tolist())
Out[2]:
0 1
0 {'Room': [6], 'Length': 7} {'Room': [6], 'Length': 7}
1 {'Room': [6], 'Length': 22} None
2 {'Room': [6], 'Length': 8} {'Room': [6], 'Length': 8}
#treat the irregular structures
def process_structure(s):
specs = []
for label,quantity in s.items():
if isinstance(quantity,list):
specs.append(label)
for elem in quantity:
specs.append(elem)
elif isinstance(quantity,int):
specs.append(label)
specs.append(quantity)
return specs
#open and treat jsons
def treat_json(file):
with open(file, 'r') as f:
dicts = {}
to_df = []
load_df = []
valRoom = 0
valLen = 0
structures = json.load(f)
for dicts in structures:
to_df = process_structure(dicts)
long = len(to_df)
for i in range(0,long):
if to_df[i] == 'Room':
valRoom = to_df[i+1]
load_df.append(valRoom)
elif to_df[i] == 'Length':
valLen = to_df[i+1]
load_df.append(valLen)
elif isinstance(to_df[i],int) and i < (long - 1):
if isinstance(to_df[i+1],int):
load_df.append(to_df[i+1])
load_df.append(valLen)#repeat Length
while len(load_df) < 4: #if its no complete
load_df.append(None)
df_temp = pd.DataFrame([load_df],columns=['Structure_value_room_1','Structure_value_length_1','Structure_value_room_2','Structure_value_length_2'])
return df_temp
treat_json('house3.json')
Structure_value_room_1 ... Structure_value_length_2
0 6 ... 8
[1 rows x 4 columns]
treat_json('house2.json')
Structure_value_room_1 ... Structure_value_length_2
0 6 ... None
[1 rows x 4 columns]
treat_json('house1.json')
Structure_value_room_1 ... Structure_value_length_2
0 6 ... 7
[1 rows x 4 columns]