Python 使用Pandas读取子级别JSON数据_Python_Json_Api_Pandas

Python 使用Pandas读取子级别JSON数据

python json api pandas

Python 使用Pandas读取子级别JSON数据,python,json,api,pandas,Python,Json,Api,Pandas,我在使用Pandas读取子级别数据时卡住了背景：我使用NYT Archive API下载了一系列数据，并将其存储在一个JSON文件中，该文件实际上包含JSON对象列表程序：我使用read_JSON方法读取JSON文件 pandas\u df=pd.read\u json（“data.json”）当我看到使用head的示例结果时，它如下所示： pandas_df.head() copyright \ 0 Copyright (c) 2013 The New York Tim

我在使用Pandas读取子级别数据时卡住了

背景：

我使用NYT Archive API下载了一系列数据，并将其存储在一个JSON文件中，该文件实际上包含JSON对象列表

程序：

我使用read_JSON方法读取JSON文件

pandas\u df=pd.read\u json（“data.json”）

当我看到使用head的示例结果时，它如下所示：

 pandas_df.head()
    copyright  \
0  Copyright (c) 2013 The New York Times Company....   
1  Copyright (c) 2013 The New York Times Company....   
2  Copyright (c) 2013 The New York Times Company....   
3  Copyright (c) 2013 The New York Times Company....   
4  Copyright (c) 2013 The New York Times Company....   

                                            response  
0  {'docs': [{'subsection_name': None, 'slideshow...  
1  {'docs': [{'subsection_name': None, 'slideshow...  
2  {'docs': [{'subsection_name': None, 'slideshow...  
3  {'docs': [{'subsection_name': None, 'slideshow...  
4  {'docs': [{'subsection_name': None, 'slideshow...

print(pandas_df["response"].head())
0    {'docs': [{'subsection_name': None, 'slideshow...
1    {'docs': [{'subsection_name': None, 'slideshow...
2    {'docs': [{'subsection_name': None, 'slideshow...
3    {'docs': [{'subsection_name': None, 'slideshow...
4    {'docs': [{'subsection_name': None, 'slideshow...
Name: response, dtype: object

我只需要回复中的信息。因此，当我更改如下代码时：

 pandas_df.head()
    copyright  \
0  Copyright (c) 2013 The New York Times Company....   
1  Copyright (c) 2013 The New York Times Company....   
2  Copyright (c) 2013 The New York Times Company....   
3  Copyright (c) 2013 The New York Times Company....   
4  Copyright (c) 2013 The New York Times Company....   

                                            response  
0  {'docs': [{'subsection_name': None, 'slideshow...  
1  {'docs': [{'subsection_name': None, 'slideshow...  
2  {'docs': [{'subsection_name': None, 'slideshow...  
3  {'docs': [{'subsection_name': None, 'slideshow...  
4  {'docs': [{'subsection_name': None, 'slideshow...

print(pandas_df["response"].head())
0    {'docs': [{'subsection_name': None, 'slideshow...
1    {'docs': [{'subsection_name': None, 'slideshow...
2    {'docs': [{'subsection_name': None, 'slideshow...
3    {'docs': [{'subsection_name': None, 'slideshow...
4    {'docs': [{'subsection_name': None, 'slideshow...
Name: response, dtype: object

问题：

如何使用文档中的元素获取数据？比如小节，幻灯片等等。我可以用表格格式看它吗，比如数据框

如果需要更多信息，请告诉我

谢谢

编辑1:

从JSON文件添加第一个元素。这个文件太大，大约1GB

{
  "copyright": "Copyright (c) 2013 The New York Times Company.  All Rights Reserved.",
  "response": {
    "meta": {
      "hits": 7652
    },
    "docs": [
      {
        "web_url": "http://www.nytimes.com/interactive/2016/technology/personaltech/cord-cutting-guide.html",
        "snippet": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.",
        "lead_paragraph": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.",
        "abstract": null,
        "print_page": null,
        "blog": [],
        "source": "The New York Times",
        "multimedia": [
          {
            "width": 190,
            "url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg",
            "height": 126,
            "subtype": "wide",
            "legacy": {
              "wide": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg",
              "wideheight": "126",
              "widewidth": "190"
            },
            "type": "image"
          },
          {
            "width": 600,
            "url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg",
            "height": 346,
            "subtype": "xlarge",
            "legacy": {
              "xlargewidth": "600",
              "xlarge": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg",
              "xlargeheight": "346"
            },
            "type": "image"
          },
          {
            "width": 75,
            "url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg",
            "height": 75,
            "subtype": "thumbnail",
            "legacy": {
              "thumbnailheight": "75",
              "thumbnail": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg",
              "thumbnailwidth": "75"
            },
            "type": "image"
          }
        ],
        "headline": {
          "main": "The Definitive Guide to Cord-Cutting in 2016, Based on Your Habits",
          "kicker": "Tech Fix"
        },
        "keywords": [
          {
            "rank": "1",
            "is_major": "N",
            "name": "subject",
            "value": "Video Recordings, Downloads and Streaming"
          },
          {
            "rank": "2",
            "is_major": "N",
            "name": "subject",
            "value": "Television Sets and Media Devices"
          },
          {
            "rank": "1",
            "is_major": "Y",
            "name": "subject",
            "value": "Television"
          }
        ],
        "pub_date": "2016-01-01T05:00:00Z",
        "document_type": "multimedia",
        "news_desk": "Technology / Personal Tech",
        "section_name": "Technology",
        "subsection_name": "Personal Tech",
        "byline": {
          "person": [
            {
              "firstname": "Brian",
              "middlename": "X.",
              "lastname": "CHEN",
              "rank": 1,
              "role": "reported",
              "organization": ""
            }
          ],
          "original": "By BRIAN X. CHEN"
        },
        "type_of_material": "Interactive Feature",
        "_id": "57fdfb9895d0e022439c2b57",
        "word_count": null,
        "slideshow_credits": null
      }]}}

您应该能够将嵌套在

响应

字典中的

文档

列表下的所有元素提取到数据帧中

import json
with open('data.json') as f:
    data = json.load(f)
df = pd.DataFrame(data['response']['docs'])

你能为前几行发布整个原始JSON吗？添加了，请看一看。我想读取“docs”中的大部分值。最后一行给了我一个错误：TypeError：列表索引必须是整数或片，而不是str。你知道为什么会这样吗？是因为我正在读取一个列表中包含多个JSON对象的文件吗？我通过添加一个右括号和两个右大括号对JSON输入进行了一点修改。将确切的json直接复制到一个文件中，然后再次运行我的代码。它应该会起作用。