Python 加载json的更快方法_Python_Json_Pandas

Python 加载json的更快方法

python json pandas

Python 加载json的更快方法,python,json,pandas,Python,Json,Pandas,我有保存为json的网站日志，我想将它们加载到pandas中。我有这种json结构，有多个嵌套数据： {"settings":{"siteIdentifier":"site1"}, "event":{"name":"pageview", "properties":[]}, "context":{"date":"Thu Dec 01 2016 01:00:08 GMT+0100 (CET)", "location":{"ha

我有保存为json的网站日志，我想将它们加载到pandas中。我有这种json结构，有多个嵌套数据：

{"settings":{"siteIdentifier":"site1"},
    "event":{"name":"pageview",
             "properties":[]},
    "context":{"date":"Thu Dec 01 2016 01:00:08 GMT+0100 (CET)",
               "location":{"hash":"",
                           "host":"aaa"},
               "screen":{"availHeight":876,
                         "orientation":{"angle":0,
                                        "type":"landscape-primary"}},
               "navigator":{"appCodeName":"Mozilla",
                            "vendorSub":""},
               "visitor":{"id": "unique_id"}},
    "server":{"HTTP_COOKIE":"uid",
              "date":"2016-12-01T00:00:09+00:00"}}
{"settings":{"siteIdentifier":"site2"},
    "event":{"name":"pageview",
             "properties":[]},
    "context":{"date":"Thu Dec 01 2016 01:00:10 GMT+0100 (CET)",
               "location":{"hash":"",
                           "host":"aaa"},
               "screen":{"availHeight":852,
                         "orientation":{"angle":90,
                                        "type":"landscape-primary"}},
               "navigator":{"appCodeName":"Mozilla",
                            "vendorSub":""},
               "visitor":{"id": "unique_id"}},
    "server":{"HTTP_COOKIE":"uid",
              "date":"2016-12-01T00:00:09+00:10"}}

目前唯一可行的解决方案是：

import pandas as pd
import json
from pandas.io.json import json_normalize
pd.set_option('expand_frame_repr', False)
pd.set_option('display.max_columns', 10)
pd.set_option("display.max_rows",30)

first = True
filename = "/path/to/file.json"
with open(filename, 'r') as f:
    for line in f: # read line by line to retrieve only one json
        data = json.loads(line) # convert single json from string to json
        if first: # initialize the dataframe
            df = json_normalize(data)
            first = False
        else: # add a row for each json
            df=df.append(json_normalize(data)) #normalize to flatten the data
df.to_csv("2016-12-02.csv",index=False, encoding='utf-8')

我必须逐行阅读，因为我的JSON只是一个接一个地粘贴，而不是在列表中。我的代码正在运行，但速度非常慢。

我能做些什么来改进它？我使用pandas是因为它看起来很合适，但如果有其他方法，也可以。

您可以先将所有JSON对象放入一个iterable中：

with open(filename, 'r') as f:
    data = [json.loads(line) for line in f]
    df = json_normalize(data)
df.to_csv("2016-12-02.csv",index=False, encoding='utf-8')

您可以先将所有JSON对象放入一个iterable中：

with open(filename, 'r') as f:
    data = [json.loads(line) for line in f]
    df = json_normalize(data)
df.to_csv("2016-12-02.csv",index=False, encoding='utf-8')

我在'df=json_normalize（data）'上遇到一个错误：“TypeError:'generator'对象没有属性'getitem'”@harrypotfler ok，我不确定会发生什么，因为我不使用pandas，所以我想先尝试稍微高效一点的选项。我编辑了它，它现在是方括号，表示列表理解而不是生成器。我刚刚测试了它，速度惊人！谢谢！我在'df=json_normalize（data）'上遇到一个错误：“TypeError:'generator'对象没有属性'getitem'”@harrypotfler ok，我不确定会发生什么，因为我不使用pandas，所以我想先尝试稍微高效一点的选项。我编辑了它，它现在是方括号，表示列表理解而不是生成器。我刚刚测试了它，速度惊人！谢谢！