如何在.txt文件中存在的JSON对象之间添加逗号,然后在Python中将其转换为JSON数组
我正在读一个txt文件,其中包含JSON对象,这些对象之间没有逗号分隔。我想在json对象之间添加逗号,并将它们全部放入json列表或数组中 我尝试了JSON.loads,但是我得到了JSON解码错误。所以我意识到我应该在.txt文件中的不同对象之间加逗号 下面是.txt文件内容的示例如何在.txt文件中存在的JSON对象之间添加逗号,然后在Python中将其转换为JSON数组,python,json,python-3.x,Python,Json,Python 3.x,我正在读一个txt文件,其中包含JSON对象,这些对象之间没有逗号分隔。我想在json对象之间添加逗号,并将它们全部放入json列表或数组中 我尝试了JSON.loads,但是我得到了JSON解码错误。所以我意识到我应该在.txt文件中的不同对象之间加逗号 下面是.txt文件内容的示例 { "@mdate": "2011-01-11", "@key": "journals/acta/Saxena96", "author": { "ftail": "\n",
{
"@mdate": "2011-01-11",
"@key": "journals/acta/Saxena96",
"author": {
"ftail": "\n",
"ftext": "Sanjeev Saxena"
},
"title": {
"ftail": "\n",
"ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
},
"pages": {
"ftail": "\n",
"ftext": "607-619"
},
"year": {
"ftail": "\n",
"ftext": "1996"
},
"volume": {
"ftail": "\n",
"ftext": "33"
},
"journal": {
"ftail": "\n",
"ftext": "Acta Inf."
},
"number": {
"ftail": "\n",
"ftext": "7"
},
"url": {
"ftail": "\n",
"ftext": "db/journals/acta/acta33.htmlfSaxena96"
},
"ee": {
"ftail": "\n",
"ftext": "http://dx.doi.org/10.1007/BF03036466"
},
"ftail": "\n",
"ftext": "\n"
}{
"@mdate": "2011-01-11",
"@key": "journals/acta/Simon83",
"author": {
"ftail": "\n",
"ftext": "Hans-Ulrich Simon"
},
"title": {
"ftail": "\n",
"ftext": "Pattern Matching in Trees and Nets."
},
"pages": {
"ftail": "\n",
"ftext": "227-248"
},
"year": {
"ftail": "\n",
"ftext": "1983"
},
"volume": {
"ftail": "\n",
"ftext": "20"
},
"journal": {
"ftail": "\n",
"ftext": "Acta Inf."
},
"url": {
"ftail": "\n",
"ftext": "db/journals/acta/acta20.htmlfSimon83"
},
"ee": {
"ftail": "\n",
"ftext": "http://dx.doi.org/10.1007/BF01257084"
},
"ftail": "\n",
"ftext": "\n"
}
“好的,好的”
预期结果:
“好的,好的”
如果您始终可以保证JSON的格式与示例中相同,即新JSON对象从最后一行结束的同一行开始,并且没有缩进,您只需将JSON读入缓冲区,直到遇到这样的行,然后发送缓冲区进行JSON解析即可-漂洗&重复:
import json
parsed = [] # a list to hold individually parsed JSON objects
with open('path/to/your.json') as f:
buffer = ''
for line in f:
if line[0] == '}': # end of the current JSON object
parsed.append(json.loads(buffer + '}'))
buffer = line[1:]
else:
buffer += line
print(json.dumps(parsed, indent=2)) # just to make sure it all went well
这将产生:
[
{
"@mdate": "2011-01-11",
"@key": "journals/acta/Saxena96",
"author": {
"ftail": "\n",
"ftext": "Sanjeev Saxena"
},
"title": {
"ftail": "\n",
"ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
},
"pages": {
"ftail": "\n",
"ftext": "607-619"
},
"year": {
"ftail": "\n",
"ftext": "1996"
},
"volume": {
"ftail": "\n",
"ftext": "33"
},
"journal": {
"ftail": "\n",
"ftext": "Acta Inf."
},
"number": {
"ftail": "\n",
"ftext": "7"
},
"url": {
"ftail": "\n",
"ftext": "db/journals/acta/acta33.htmlfSaxena96"
},
"ee": {
"ftail": "\n",
"ftext": "http://dx.doi.org/10.1007/BF03036466"
},
"ftail": "\n",
"ftext": "\n"
},
{
"@mdate": "2011-01-11",
"@key": "journals/acta/Simon83",
"author": {
"ftail": "\n",
"ftext": "Hans-Ulrich Simon"
},
"title": {
"ftail": "\n",
"ftext": "Pattern Matching in Trees and Nets."
},
"pages": {
"ftail": "\n",
"ftext": "227-248"
},
"year": {
"ftail": "\n",
"ftext": "1983"
},
"volume": {
"ftail": "\n",
"ftext": "20"
},
"journal": {
"ftail": "\n",
"ftext": "Acta Inf."
},
"url": {
"ftail": "\n",
"ftext": "db/journals/acta/acta20.htmlfSimon83"
},
"ee": {
"ftail": "\n",
"ftext": "http://dx.doi.org/10.1007/BF01257084"
},
"ftail": "\n",
"ftext": "\n"
}
]
如果您的情况不那么明确(即,您无法预测格式),您可以尝试一些迭代/基于事件的JSON解析器(例如),一旦“根”对象关闭,它就可以告诉您,这样您就可以将解析的JSON对象“拆分”成一个序列
更新:再想一想,除了内置的json
模块之外,您不需要任何东西,即使连接的json没有正确或缩进-您可以使用(及其未记录的第二个参数)遍历数据并以迭代方式查找有效的JSON结构,直到遍历整个文件(或遇到错误)。例如:
import json
parser = json.JSONDecoder()
parsed = [] # a list to hold individually parsed JSON structures
with open('test.json') as f:
data = f.read()
head = 0 # hold the current position as we parse
while True:
head = (data.find('{', head) + 1 or data.find('[', head) + 1) - 1
try:
struct, head = parser.raw_decode(data, head)
parsed.append(struct)
except (ValueError, json.JSONDecodeError): # no more valid JSON structures
break
print(json.dumps(parsed, indent=2)) # make sure it all went well
应该给出与上面相同的结果,但这一次不依赖于当JSON对象“关闭”时,
}
是新行的第一个字符。它也适用于背靠背堆叠的JSON数组。您可以使用reqexp在对象之间添加逗号:
import re
with open('name.txt', 'r') as input, open('out.txt', 'w') as output:
output.write("[\n")
for line in input:
line = re.sub('}{', '},{', line)
output.write(' '+line)
output.write("]\n")
非常感谢你的帮助。但我收到的JSON格式与我提供的格式不一样。如何使用ijson基于根节点拆分json?使用regex更改层次/嵌套结构是一个-您可以通过这种方式统一更改更深层次的嵌套结构或值(即
“ftext”:“我包含{this}{that}”
将获得额外的逗号)。此外,如果你想做简单的字符串替换,正则表达式是一种过分的技巧——可以很好地完成这项工作
import re
with open('name.txt', 'r') as input, open('out.txt', 'w') as output:
output.write("[\n")
for line in input:
line = re.sub('}{', '},{', line)
output.write(' '+line)
output.write("]\n")