如何在python中使用ujson库将数据序列化到给定文件
我使用下面的代码生成json文件如何在python中使用ujson库将数据序列化到给定文件,python,python-2.7,python-3.x,Python,Python 2.7,Python 3.x,我使用下面的代码生成json文件 f = open("test.txt", 'r') buffer = f.readlines() rec_cnt = 1 with open("test.json", "w") as ujson_file: for line in buffer: data_dict[rec_cnt] = {line} if rec_cnt == 100: ujson.dump(data_dict.values(), ujson_file)
f = open("test.txt", 'r')
buffer = f.readlines()
rec_cnt = 1
with open("test.json", "w") as ujson_file:
for line in buffer:
data_dict[rec_cnt] = {line}
if rec_cnt == 100:
ujson.dump(data_dict.values(), ujson_file)
data_dict.clear()
rec_cnt +=1
f.close()
ujson_file
问题是,每当我按批处理时,每批记录都会包含[],这将形成一个无效的json文件
f = open("test.txt", 'r')
buffer = f.readlines()
rec_cnt = 1
with open("test.json", "w") as ujson_file:
for line in buffer:
data_dict[rec_cnt] = {line}
if rec_cnt == 100:
ujson.dump(data_dict.values(), ujson_file)
data_dict.clear()
rec_cnt +=1
f.close()
ujson_file
比如说
输入文件将被删除
fruits Orange Apple Kiwi Banana Veggies Tomato Potatoe Carrot Peas Bigfruits WaterMelon cantaloupes Papaya melon
输出文件应为:
[{Key:fruits, Values: [Orange, Apple, Kiwi, Banana]}][{Key:Veggies, Values: [Tomato, Potatoe, Carrot, Peas]}][{Key:Bigfruits, Values: [WaterMelon, cantaloupes, Papaya, melon]}]
任何只使用一个“[]”或不使用“[]”序列化数据的建议总是会返回一个列表对象,您将直接将其写入ujson\u文件
如果这不是您想要的,请尝试编写data\u dict.values()[0]
。我每个列表只看到一本字典,所以我假设这个模式保持一致。我通常不会为人们编写完整的脚本,但我发现自己整个星期都在等待批处理过程完成
试试这个。它解释了我前面提到的失败案例,更容易阅读,应该可以完成您所寻找的
import ujson
# Define what our keys are
keys = ('fruits', 'veggies', 'bigfruits')
# Define how big we want each batch
batch_size = 100
# Define a method to write a list out to a json file
# (I think the way you did this is the original source of your problem)
def flush(objs):
with open("test.json", "wb") as ujson_file:
ujson.dump(objs, ujson_file, indent=4)
# Use a context manager to handle file I/O
with open('test.txt', 'rb') as input_source:
# Create somewhere to put stuff to write to file
output = []
# Don't read the entire file into memory, you may run
# out of memory with larger files...
# buffer = f.readlines()
# ...instead, load it line by line.
for line in input_source:
data = {}
# Parse the line, make it a list we can iterate through
line = line.split(' ')
# Look through the list, store any value that isn't a known key
current_key = None
for term in line:
# Erase spaces
term = term.strip()
# If it's a blank "word", skip it
if not term:
continue
# If it's a key, let's start a new list
elif term.lower() in keys: # Lowercase the term just in case capitalization is inconsistent
data[term] = []
current_key = term
# We know the current key we're working with; add this to that list
else:
data[current_key].append(term)
# Add the dict to our output buffer
output.append(data)
# If we've written enough to flush, flush it
if len(output) >= batch_size:
flush(output)
# We've reached the end of the file. If we have anything left to flush,
# do it now.
flush(output)
看看如何编写@pvg,希望上面的代码更改能有所帮助。这里需要帮助。你能给我们看一下test.txt
中的一行吗?@Johnny我已经在批量转储时添加了输入文件和期望的输出文件。从这个示例中,我们可以看到batch 1 Record data_dict定义在哪里,以及如何解析test.txt中的每一行?此外,您还有一个失败案例,即如果test.txt中有111条记录,您只会写出其中的100条记录,并将剩余的11条记录扔到地板上——或者如果您的记录少于100条,您根本不会向test.json写入任何内容……仅供参考。我收到错误信息:“对象不支持对文件进行索引。使用每批数据覆盖写入的文件。它不会将每个批次的数据添加到文件中。谢谢你的代码,我有类似的代码,除了调用一个单独的函数将数据写入文件。