如何在.txt文件中存在的JSON对象之间添加逗号,然后在Python中将其转换为JSON数组

如何在.txt文件中存在的JSON对象之间添加逗号,然后在Python中将其转换为JSON数组,python,json,python-3.x,Python,Json,Python 3.x,我正在读一个txt文件,其中包含JSON对象,这些对象之间没有逗号分隔。我想在json对象之间添加逗号,并将它们全部放入json列表或数组中 我尝试了JSON.loads,但是我得到了JSON解码错误。所以我意识到我应该在.txt文件中的不同对象之间加逗号 下面是.txt文件内容的示例 { "@mdate": "2011-01-11", "@key": "journals/acta/Saxena96", "author": { "ftail": "\n",

我正在读一个txt文件,其中包含JSON对象,这些对象之间没有逗号分隔。我想在json对象之间添加逗号,并将它们全部放入json列表或数组中

我尝试了JSON.loads,但是我得到了JSON解码错误。所以我意识到我应该在.txt文件中的不同对象之间加逗号

下面是.txt文件内容的示例

{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
        "ftail": "\n",
        "ftext": "Sanjeev Saxena"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "607-619"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1996"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "33"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "number": {
        "ftail": "\n",
        "ftext": "7"
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "\n",
    "ftext": "\n"
}{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
        "ftail": "\n",
        "ftext": "Hans-Ulrich Simon"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "227-248"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1983"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "20"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "\n",
    "ftext": "\n"
}
“好的,好的”

预期结果:

“好的,好的”


如果您始终可以保证JSON的格式与示例中相同,即新JSON对象从最后一行结束的同一行开始,并且没有缩进,您只需将JSON读入缓冲区,直到遇到这样的行,然后发送缓冲区进行JSON解析即可-漂洗&重复:

import json

parsed = []  # a list to hold individually parsed JSON objects
with open('path/to/your.json') as f:
    buffer = ''
    for line in f:
        if line[0] == '}':  # end of the current JSON object
            parsed.append(json.loads(buffer + '}'))
            buffer = line[1:]
        else:
            buffer += line

print(json.dumps(parsed, indent=2))  # just to make sure it all went well
这将产生:

[
  {
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
      "ftail": "\n",
      "ftext": "Sanjeev Saxena"
    },
    "title": {
      "ftail": "\n",
      "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
      "ftail": "\n",
      "ftext": "607-619"
    },
    "year": {
      "ftail": "\n",
      "ftext": "1996"
    },
    "volume": {
      "ftail": "\n",
      "ftext": "33"
    },
    "journal": {
      "ftail": "\n",
      "ftext": "Acta Inf."
    },
    "number": {
      "ftail": "\n",
      "ftext": "7"
    },
    "url": {
      "ftail": "\n",
      "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
      "ftail": "\n",
      "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "\n",
    "ftext": "\n"
  },
  {
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
      "ftail": "\n",
      "ftext": "Hans-Ulrich Simon"
    },
    "title": {
      "ftail": "\n",
      "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
      "ftail": "\n",
      "ftext": "227-248"
    },
    "year": {
      "ftail": "\n",
      "ftext": "1983"
    },
    "volume": {
      "ftail": "\n",
      "ftext": "20"
    },
    "journal": {
      "ftail": "\n",
      "ftext": "Acta Inf."
    },
    "url": {
      "ftail": "\n",
      "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
      "ftail": "\n",
      "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "\n",
    "ftext": "\n"
  }
]
如果您的情况不那么明确(即,您无法预测格式),您可以尝试一些迭代/基于事件的JSON解析器(例如),一旦“根”对象关闭,它就可以告诉您,这样您就可以将解析的JSON对象“拆分”成一个序列

更新:再想一想,除了内置的
json
模块之外,您不需要任何东西,即使连接的json没有正确或缩进-您可以使用(及其未记录的第二个参数)遍历数据并以迭代方式查找有效的JSON结构,直到遍历整个文件(或遇到错误)。例如:

import json

parser = json.JSONDecoder()
parsed = []  # a list to hold individually parsed JSON structures
with open('test.json') as f:
    data = f.read()
head = 0  # hold the current position as we parse
while True:
    head = (data.find('{', head) + 1 or data.find('[', head) + 1) - 1
    try:
        struct, head = parser.raw_decode(data, head)
        parsed.append(struct)
    except (ValueError, json.JSONDecodeError):  # no more valid JSON structures
        break

print(json.dumps(parsed, indent=2))  # make sure it all went well

应该给出与上面相同的结果,但这一次不依赖于当JSON对象“关闭”时,
}
是新行的第一个字符。它也适用于背靠背堆叠的JSON数组。

您可以使用reqexp在对象之间添加逗号:

import re

with open('name.txt', 'r') as input, open('out.txt', 'w') as output:
    output.write("[\n")
    for line in input:
        line = re.sub('}{', '},{', line)
        output.write('    '+line)
    output.write("]\n")

非常感谢你的帮助。但我收到的JSON格式与我提供的格式不一样。如何使用ijson基于根节点拆分json?使用regex更改层次/嵌套结构是一个-您可以通过这种方式统一更改更深层次的嵌套结构或值(即
“ftext”:“我包含{this}{that}”
将获得额外的逗号)。此外,如果你想做简单的字符串替换,正则表达式是一种过分的技巧——可以很好地完成这项工作
import re

with open('name.txt', 'r') as input, open('out.txt', 'w') as output:
    output.write("[\n")
    for line in input:
        line = re.sub('}{', '},{', line)
        output.write('    '+line)
    output.write("]\n")