如何在python中使用ujson库将数据序列化到给定文件_Python_Python 2.7_Python 3.x

如何在python中使用ujson库将数据序列化到给定文件

python python-2.7 python-3.x

如何在python中使用ujson库将数据序列化到给定文件,python,python-2.7,python-3.x,Python,Python 2.7,Python 3.x,我使用下面的代码生成json文件 f = open("test.txt", 'r') buffer = f.readlines() rec_cnt = 1 with open("test.json", "w") as ujson_file: for line in buffer: data_dict[rec_cnt] = {line} if rec_cnt == 100: ujson.dump(data_dict.values(), ujson_file)

我使用下面的代码生成json文件

f = open("test.txt", 'r')
buffer = f.readlines()
rec_cnt = 1
with open("test.json", "w") as ujson_file:  
for line in buffer:
    data_dict[rec_cnt] = {line}
    if rec_cnt == 100:
        ujson.dump(data_dict.values(), ujson_file)
        data_dict.clear()
rec_cnt +=1
f.close()
ujson_file

问题是，每当我按批处理时，每批记录都会包含[]，这将形成一个无效的json文件

f = open("test.txt", 'r')
buffer = f.readlines()
rec_cnt = 1
with open("test.json", "w") as ujson_file:  
for line in buffer:
    data_dict[rec_cnt] = {line}
    if rec_cnt == 100:
        ujson.dump(data_dict.values(), ujson_file)
        data_dict.clear()
rec_cnt +=1
f.close()
ujson_file

比如说输入文件将被删除

 fruits      Orange       Apple        Kiwi        Banana     Veggies     Tomato       Potatoe      Carrot      Peas       Bigfruits   WaterMelon   cantaloupes  Papaya      melon

输出文件应为：

[{Key:fruits, Values: [Orange, Apple, Kiwi, Banana]}][{Key:Veggies, Values: [Tomato, Potatoe, Carrot, Peas]}][{Key:Bigfruits, Values: [WaterMelon, cantaloupes, Papaya, melon]}]

任何只使用一个“[]”或不使用“[]”序列化数据的建议总是会返回一个列表对象，您将直接将其写入

ujson\u文件
如果这不是您想要的，请尝试编写data\u dict.values（）[0]
。我每个列表只看到一本字典，所以我假设这个模式保持一致。
我通常不会为人们编写完整的脚本，但我发现自己整个星期都在等待批处理过程完成
试试这个。它解释了我前面提到的失败案例，更容易阅读，应该可以完成您所寻找的
import ujson

# Define what our keys are
keys = ('fruits', 'veggies', 'bigfruits')

# Define how big we want each batch
batch_size = 100

# Define a method to write a list out to a json file
# (I think the way you did this is the original source of your problem)
def flush(objs):
    with open("test.json", "wb") as ujson_file:  
        ujson.dump(objs, ujson_file, indent=4)

# Use a context manager to handle file I/O
with open('test.txt', 'rb') as input_source:

    # Create somewhere to put stuff to write to file
    output = []

    # Don't read the entire file into memory, you may run
    # out of memory with larger files...
    # buffer = f.readlines()

    # ...instead, load it line by line.
    for line in input_source:
        data = {}

        # Parse the line, make it a list we can iterate through
        line = line.split(' ')

        # Look through the list, store any value that isn't a known key
        current_key = None
        for term in line:
            # Erase spaces
            term = term.strip() 
            # If it's a blank "word", skip it
            if not term:
                continue

            # If it's a key, let's start a new list
            elif term.lower() in keys: # Lowercase the term just in case capitalization is inconsistent
                data[term] = []
                current_key = term

            # We know the current key we're working with; add this to that list
            else:
                data[current_key].append(term)

        # Add the dict to our output buffer
        output.append(data)

        # If we've written enough to flush, flush it
        if len(output) >= batch_size:
            flush(output)

     # We've reached the end of the file. If we have anything left to flush,
     # do it now.
     flush(output)

看看如何编写@pvg，希望上面的代码更改能有所帮助。这里需要帮助。你能给我们看一下test.txt
中的一行吗？@Johnny我已经在批量转储时添加了输入文件和期望的输出文件。从这个示例中，我们可以看到batch 1 Record data_dict定义在哪里，以及如何解析test.txt中的每一行？此外，您还有一个失败案例，即如果test.txt中有111条记录，您只会写出其中的100条记录，并将剩余的11条记录扔到地板上——或者如果您的记录少于100条，您根本不会向test.json写入任何内容……仅供参考。我收到错误信息：“对象不支持对文件进行索引。使用每批数据覆盖写入的文件。它不会将每个批次的数据添加到文件中。谢谢你的代码，我有类似的代码，除了调用一个单独的函数将数据写入文件。