Python 将数据框中的列添加到特定对象级别的深度嵌套JSON中
假设我有一个数据帧Python 将数据框中的列添加到特定对象级别的深度嵌套JSON中,python,json,pandas,numpy,dataframe,Python,Json,Pandas,Numpy,Dataframe,假设我有一个数据帧df,比如: source tables columns data_type length RecordCount src1 table1 col1 INT 4 71 src1 table1 col2 CHAR 2 71 src1 table2 col1 CHAR 2
df
,比如:
source tables columns data_type length RecordCount
src1 table1 col1 INT 4 71
src1 table1 col2 CHAR 2 71
src1 table2 col1 CHAR 2 43
src2 table1 col1 INT 4 21
src2 table1 col2 DATE 3 21
需要类似于以下内容的输出:
{
"src1": {
"table1": {
"Record Count": 71 #missing in my current code output
"col1": {
"type": "INT"
"length": 4
},
"col2": {
"type": "CHAR"
"length": 2
}
},
"table2": {
"Record Count": 43 #missing in my current code output
"col1": {
"type": "CHAR"
"length": 2
}
}
},
"src2": {
"table1": {
"Record Count": 21 #missing in my current code output
"col1": {
"type": "INT"
"length": 4
},
"col2": {
"type": "DATE"
"length": 3
}
}
}
}
当前代码:
def make_nested(df):
f = lambda: defaultdict(f)
data = f()
for row in df.to_numpy().tolist():
t = data
for index, r in enumerate(row[:-4]):
t = t[r]
if index == 1:
t[row[-5]]: {
"Record Count": row[-1]
}
t[row[-4]] = {
"type": row[-3],
"length": row[-2]
}
return data
下面是使用groupby方法的两个步骤的另一个解决方案
# First, groupby ['source','tables'] to deal with the annoying 'Record Count'
# Need python 3.5+
# Otherwise, another method to merge two dicts should be used
df_new=df.groupby(['source','tables']).apply(lambda x: {**{'Record Count':x.iloc[0,-1]}, **{x.iloc[i,-4]: {'type':x.iloc[i,-3],'length':x.iloc[i,-2]} for i in range(len(x))}}).reset_index()
看
在第一步之后,df_new
如下所示
source tables 0
0 src1 table1 {'Record Count': 71, 'col1': {'type': 'INT', 'length': 4}, 'col2': {'type': 'CHAR', 'length': 2}}
1 src1 table2 {'Record Count': 43, 'col1': {'type': 'CHAR', 'length': 2}}
2 src2 table1 {'Record Count': 21, 'col1': {'type': 'INT', 'length': 4}, 'col2': {'type': 'DATE', 'length': 3}}
输出是json文件的编码字符串类型。获取缩进版本
import json
temp = json.loads(output)
with open('somefile','w') as f:
json.dump(temp,f,indent=4)
对于索引,枚举(第[:-4]行)中的r:
应该替换第[:-4]行中r的:
,而不是将一个嵌套在另一个中。对代码进行编辑后,看起来我得到的是相同的原始输出,没有在JSON文件中添加新的记录计数信息。谢谢,这是有效的,但是,当我将此信息转储到文件中时,这一切都出现在一行,而不是像我在我的文章中显示的预期间隔输出。我如何修复代码以允许这样做?在每个“
之前都有斜杠,我需要去掉它们of@weovibewvoibweoivwoiv添加一些内容以更改格式。另外,要直接修复当前代码,请尝试将t[行[-5]]:{“记录计数”:行[-1]}
更改为t[“记录计数”]=行[-1]
import json
temp = json.loads(output)
with open('somefile','w') as f:
json.dump(temp,f,indent=4)