Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/298.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何将嵌套数据示例中的两个值抽象到数据帧中?_Python_Pandas_Dictionary - Fatal编程技术网

Python 如何将嵌套数据示例中的两个值抽象到数据帧中?

Python 如何将嵌套数据示例中的两个值抽象到数据帧中?,python,pandas,dictionary,Python,Pandas,Dictionary,我使用的是Standford的一个开发包(见DevSet2.0)。此文件为JSON格式。当我读取文件时,它是一本字典,但我将其更改为DF: import json json_file = open("dev-v2.0.json", "r") json_data = json.load(json_file) json_file.close() df = pd.DataFrame.from_dict(json_data) df = df[0:2] # for this example, only

我使用的是Standford的一个开发包(见DevSet2.0)。此文件为JSON格式。当我读取文件时,它是一本字典,但我将其更改为DF:

import json
json_file = open("dev-v2.0.json", "r")
json_data = json.load(json_file)
json_file.close()

df = pd.DataFrame.from_dict(json_data)
df = df[0:2] # for this example, only a subset
我需要的所有信息都在df['data']列中。在每一行中,都有如此多的数据,格式如下:

{'title': 'Normans', 'paragraphs': [{'qas': [{'question': 'In what country is Normandy located?', 'id': '56ddde6b9a695914005b9628', 'answers': [{'text': 'France', 'answer_start': 159}, {'text': 'France', 'answer_start': 159}, {'text': 'France', 'answer_start': 159}, {'text': 'France', 'answer_start': 159}], 'is_impossible': False}, {'question': 'When were the Normans in Normandy?', 'id': '56ddde6b9a695914005b9629', 'answers': [{'text': '10th and 11th centuries', 'answer_start': 94}, {'text': 'in the 10th and 11th centuries', 'answer_start': 87}
我想查询DF中所有行中的所有问题和答案。 理想情况下,输出如下:

Question                                         Answer 
'In what country is Normandy located?'          'France'
'When were the Normans in Normandy?'            'in the 10th and 11th centuries'
先道歉时,它只打印一小部分…(这无助于重现此问题)


非常感谢这应该可以让您开始

我不确定如何处理答案字段为空的情况,因此您可能需要想出更好的解决方案。例如:

"question": " After 1945, what challenged the British empire?", "id": "5ad032b377cf76001a686e0d", "answers": [], "is_impossible": true

导入json
作为pd进口熊猫
以open(“dev-v2.0.json”、“r”)作为f:
data=json.load(f.read())
问题,答案=[],[]
对于范围内的i(len(data[“data”]):
对于范围内的j(len(数据[“数据”][i][“段落]):
对于范围内的k(len(数据[“数据”][i][“段落”][j][“qas”]):
q=数据[“数据”][i][“段落”][j][“质量保证体系”][k][“问题”]
try:#只接受第一个元素,因为其余的值是重复的?
a=数据[“数据”][i][“段落”][j][“qas”][k][“答案”][0][“文本”]
除索引器外:#当`“回答”:[]`
a=“无”
问题.附加(q)
答案.附加(a)
d={
“问题”:问题,
“答案”:答案
}
pd.数据帧(d)


以下(SQuAD(Stanford Q&A)json到Pandas DataFrame)介绍了如何将dev-v1.1.json转换为DataFrame。

我认为这更像是一个数据质量问题,属于“如何解析嵌套json文件”的范畴。看看问题的类型。因为在将json文件加载到数据帧之前需要对其进行解析。
"question": " After 1945, what challenged the British empire?", "id": "5ad032b377cf76001a686e0d", "answers": [], "is_impossible": true
                                               Questions                      Answers
0                   In what country is Normandy located?                       France
1                     When were the Normans in Normandy?      10th and 11th centuries
2          From which countries did the Norse originate?  Denmark, Iceland and Norway
3                              Who was the Norse leader?                        Rollo
4      What century did the Normans first gain their ...                 10th century
...                                                  ...                          ...
11868  What is the seldom used force unit equal to on...                       sthène
11869           What does not have a metric counterpart?                         None
11870  What is the force exerted by standard gravity ...                         None
11871  What force leads to a commonly used unit of mass?                         None
11872        What force is part of the modern SI system?                         None

[11873 rows x 2 columns]