Python 如何将嵌套数据示例中的两个值抽象到数据帧中？_Python_Pandas_Dictionary

Python 如何将嵌套数据示例中的两个值抽象到数据帧中？

python pandas dictionary

Python 如何将嵌套数据示例中的两个值抽象到数据帧中？,python,pandas,dictionary,Python,Pandas,Dictionary,我使用的是Standford的一个开发包（见DevSet2.0）。此文件为JSON格式。当我读取文件时，它是一本字典，但我将其更改为DF： import json json_file = open("dev-v2.0.json", "r") json_data = json.load(json_file) json_file.close() df = pd.DataFrame.from_dict(json_data) df = df[0:2] # for this example, only

我使用的是Standford的一个开发包（见DevSet2.0）。此文件为JSON格式。当我读取文件时，它是一本字典，但我将其更改为DF：

import json
json_file = open("dev-v2.0.json", "r")
json_data = json.load(json_file)
json_file.close()

df = pd.DataFrame.from_dict(json_data)
df = df[0:2] # for this example, only a subset

我需要的所有信息都在df['data']列中。在每一行中，都有如此多的数据，格式如下：

{'title': 'Normans', 'paragraphs': [{'qas': [{'question': 'In what country is Normandy located?', 'id': '56ddde6b9a695914005b9628', 'answers': [{'text': 'France', 'answer_start': 159}, {'text': 'France', 'answer_start': 159}, {'text': 'France', 'answer_start': 159}, {'text': 'France', 'answer_start': 159}], 'is_impossible': False}, {'question': 'When were the Normans in Normandy?', 'id': '56ddde6b9a695914005b9629', 'answers': [{'text': '10th and 11th centuries', 'answer_start': 94}, {'text': 'in the 10th and 11th centuries', 'answer_start': 87}

我想查询DF中所有行中的所有问题和答案。理想情况下，输出如下：

Question                                         Answer 
'In what country is Normandy located?'          'France'
'When were the Normans in Normandy?'            'in the 10th and 11th centuries'

先道歉时，它只打印一小部分…（这无助于重现此问题）

非常感谢这应该可以让您开始
我不确定如何处理答案字段为空的情况，因此您可能需要想出更好的解决方案。例如：

"question": " After 1945, what challenged the British empire?", "id": "5ad032b377cf76001a686e0d", "answers": [], "is_impossible": true

导入json 作为pd进口熊猫以open（“dev-v2.0.json”、“r”）作为f： data=json.load（f.read（））问题，答案=[]，[] 对于范围内的i（len（data[“data”]）：对于范围内的j（len（数据[“数据”][i][“段落]）：对于范围内的k（len（数据[“数据”][i][“段落”][j][“qas”]）： q=数据[“数据”][i][“段落”][j][“质量保证体系”][k][“问题”] try:#只接受第一个元素，因为其余的值是重复的？ a=数据[“数据”][i][“段落”][j][“qas”][k][“答案”][0][“文本”] 除索引器外：#当`“回答”：[]` a=“无” 问题.附加（q）答案.附加（a） d={ “问题”：问题， “答案”：答案 } pd.数据帧（d）

以下（SQuAD（Stanford Q&A）json到Pandas DataFrame）介绍了如何将dev-v1.1.json转换为DataFrame。
我认为这更像是一个数据质量问题，属于“如何解析嵌套json文件”的范畴。看看问题的类型。因为在将json文件加载到数据帧之前需要对其进行解析。
"question": " After 1945, what challenged the British empire?", "id": "5ad032b377cf76001a686e0d", "answers": [], "is_impossible": true

Questions Answers 0 In what country is Normandy located? France 1 When were the Normans in Normandy? 10th and 11th centuries 2 From which countries did the Norse originate? Denmark, Iceland and Norway 3 Who was the Norse leader? Rollo 4 What century did the Normans first gain their ... 10th century ... ... ... 11868 What is the seldom used force unit equal to on... sthène 11869 What does not have a metric counterpart? None 11870 What is the force exerted by standard gravity ... None 11871 What force leads to a commonly used unit of mass? None 11872 What force is part of the modern SI system? None [11873 rows x 2 columns]