Python，将json/dictionary对象迭代地写入文件（一次一个）_Python_Json_Dictionary

Python，将json/dictionary对象迭代地写入文件（一次一个）

python json dictionary

Python，将json/dictionary对象迭代地写入文件（一次一个）,python,json,dictionary,Python,Json,Dictionary,我有一个大型的for循环，在这个循环中我创建了json对象，我希望能够将每次迭代中的对象流式写入一个文件。我希望以后能够以类似的方式使用该文件（一次读取一个对象）。我的json对象包含换行符，我不能将每个对象作为一行转储到文件中。我怎样才能做到这一点为了使其更具体化，请考虑以下内容： for _id in collection: dict_obj = build_dict(_id) # build a dictionary object with open('file.

我有一个大型的

for循环

，在这个循环中我创建了json对象，我希望能够将每次迭代中的对象流式写入一个文件。我希望以后能够以类似的方式使用该文件（一次读取一个对象）。我的json对象包含换行符，我不能将每个对象作为一行转储到文件中。我怎样才能做到这一点

为了使其更具体化，请考虑以下内容：

for _id in collection:
    dict_obj = build_dict(_id)  # build a dictionary object 
    with open('file.json', 'a') as f:
        stream_dump(dict_obj, f)

stream\u dump

是我想要的功能

注意，我不想创建一个大列表，并使用类似于

json.dump（obj，file）

的东西转储整个列表。我希望能够在每次迭代中将对象附加到文件中

谢谢。

您需要使用

jsonecoder

的子类，然后代理

构建dict

函数

from __future__ import (absolute_import, division, print_function,)
#                        unicode_literals)

import collections
import json


mycollection = [1, 2, 3, 4]


def build_dict(_id):
    d = dict()
    d['my_' + str(_id)] = _id
    return d


class SeqProxy(collections.Sequence):
    def __init__(self, func, coll, *args, **kwargs):
        super(SeqProxy, *args, **kwargs)

        self.func = func
        self.coll = coll

    def __len__(self):
        return len(self.coll)

    def __getitem__(self, key):
        return self.func(self.coll[key])


class JsonEncoderProxy(json.JSONEncoder):
    def default(self, o):
        try:
            iterable = iter(o)
        except TypeError:
            pass
        else:
            return list(iterable)
        # Let the base class default method raise the TypeError
        return json.JSONEncoder.default(self, o)


jsonencoder = JsonEncoderProxy()
collproxy = SeqProxy(build_dict, mycollection)


for chunk in jsonencoder.iterencode(collproxy):
    print(chunk)

输出：

[
{
"my_1"
:
1
}
,
{
"my_2"
:
2
}
,
{
"my_3"
:
3
}
,
{
"my_4"
:
4
}
]

要逐块读回它，需要使用

jsondeconder

并将可调用对象作为

object\u hook

传递。当您调用

JSONDecoder.decode（json_string）

时，每个新的解码对象（列表中的每个

dict

）都将调用此钩子，因为您自己生成文件，所以您可以简单地每行写出一个json对象：

for _id in collection:
    dict_obj = build_dict(_id)  # build a dictionary object 
    with open('file.json', 'a') as f:
        f.write(json.dumps(dict_obj))
        f.write('\n')

然后，通过在以下行上迭代来读取它们：

with open('file.json', 'r') as f:
    for line in f:
        dict_obj = json.loads(line)

这不是一个很好的通用解决方案，但如果您既是发电商又是消费者，这是一个简单的解决方案。

最简单的解决方案：

从json文档中删除所有空白字符：

import string

def remove_whitespaces(txt):
    """ We shall remove all whitespaces"""
    for chr in string.whitespace:
        txt = txt.replace(chr)

显然，您还可以

json.dumps（json.loads（json_txt））

（顺便说一句，这还可以验证文本是否是有效的json）

现在，您可以将文档每行写入一个文件

第二种解决方案：

创建一个[AnyStr]Io流，在Io中写入一个有效的文档（您的文档是对象或列表的一部分），然后将Io写入一个文件（或将其上载到云中）。

如果我不理解您的问题，那么似乎可以编写一个您的数据没有的分隔行，如“----”在每次迭代中，在你们写对象之后，在读对象的时候，当你们看到那个分隔符时，创建一个新的对象。啊，我明白了。那肯定管用。我想可能还有其他流处理解决方案。太好了，谢谢。只是一个问题，

SeqProxy

做什么？您的收藏不会为每个项目返回“dict”（您正在为每个项目调用

build_dict

）当

jsonecoder

请求列表中的下一项序列化时，

SeqProxy

包装您的集合并返回

build_dict

的结果。如果我错了，请纠正我：这解决了两个问题：（a）代理需要调用集合特定子集上的自定义

build\u dict

函数；（b） JSON模块已经通过

iterencode

函数提供了逐块序列化的任务我把注意力集中在（b）上，直到意识到这都是关于（a）的，我才理解代码。如果空白是内容不可分割的一部分，会发生什么？很好的观察！无论如何，json.dumps（json.loads（json_txt））在这种情况下是完美的。为什么要删除所有空白？我看不出这是如何连接到OP的。如果您想在一行上有完整的JSON转储，请执行

JSON.dump（…indent=None）

（实际上，这已经是默认设置）。文本节点内的换行符仍将转义。