Python 如何更快地将嵌套字典转换为pd.dataframe?
我有一个json文件,看起来像这样Python 如何更快地将嵌套字典转换为pd.dataframe?,python,json,pandas,dictionary,itertools,Python,Json,Pandas,Dictionary,Itertools,我有一个json文件,看起来像这样 { "file": "name", "main": [{ "question_no": "Q.1", "question": "what is ?", "answer": [{ "user": "John", "comment": "It is defined as", "value": [
{
"file": "name",
"main": [{
"question_no": "Q.1",
"question": "what is ?",
"answer": [{
"user": "John",
"comment": "It is defined as",
"value": [
{
"my_value": 5,
"value_2": 10
},
{
"my_value": 24,
"value_2": 30
}
]
},
{
"user": "Sam",
"comment": "as John said above it simply means",
"value": [
{
"my_value": 9,
"value_2": 10
},
{
"my_value": 54,
"value_2": 19
}
]
}
],
"closed": "no"
}]
}
预期结果:
Question_no question my_value_sum value_2_sum user comment
Q.1 what is ? 29 40 john It is defined as
Q.1 what is ? 63 29 Sam as John said above it simply means
我尝试的是data=json\u normalize(file\u json,“main”)
然后使用类似for的循环
for ans, row in data.iterrows():
....
....
df = df.append(the data)
但使用这种方法的问题是,我的客户会花很多时间拒绝解决方案。main
列表中大约有1200个项目,需要转换的json文件有450个。因此,这一中间转换过程几乎需要一个小时才能完成
编辑:
是否可以将
my_值
和value_2
之和作为一列获取?(还更新了所需结果)通过main
选择字典,并使用参数record\u path
和meta
:
data = pd.json_normalize(file_json["main"],
record_path='answer',
meta=['question_no', 'question'])
print (data)
user comment question_no question
0 John It is defined as Q.1 what is ?
1 Sam as John said above it simply means Q.1 what is ?
然后,如果顺序很重要,则将最后N列转换为第一个位置:
N = 2
data = data[data.columns[-N:].tolist() + data.columns[:-N].tolist()]
print (data)
question_no question user comment
0 Q.1 what is ? John It is defined as
1 Q.1 what is ? Sam as John said above it simply means
您的json无效。如果不了解有关问题的更多详细信息,我无法确定。但是首先尝试将数据附加到列表中,然后只在末尾创建df,如
df=pd.DataFrame(问题编号、问题、用户、评论)
@datanovelnow@DominicD但是,然后用户和评论将嵌套列表对吗?你是一个救命的兄弟!难怪你有这么高的代表性。有可能在嵌套列表value
中得到my\u值
和value\u 2
的总和吗?我已编辑了所需的结果。请检查一下out@jezael事实上,我正要发一个问题。但不得不等90分钟才发布另一个问题。这里是我稍微修改过的链接,在值中包含一个submitted
键,以匹配我的具体情况。如果可以的话,请在那里回答。这对头条新闻和有同样想法的人来说是有意义的issue@Derik81-谢谢,但json似乎无效,只是在中进行了测试。