Python 如何将AWS Athena的多记录多行JSON转换为单记录行JSON？_Python_Sql_Json_Aws Lambda_Amazon Athena

Python 如何将AWS Athena的多记录多行JSON转换为单记录行JSON？

python sql json aws-lambda

Python 如何将AWS Athena的多记录多行JSON转换为单记录行JSON？,python,sql,json,aws-lambda,amazon-athena,Python,Sql,Json,Aws Lambda,Amazon Athena,我想在AWS Athena中使用json文件，但Athena不支持多行json 我有以下内容（其中一个值是XML） { “id”：10， “姓名”：“鲍勃”， “数据”：“\n\n” }, { “id”：20， “姓名”：“简”， “数据”：“\n\n” } 我需要为雅典娜做以下事情 { "id" : 10, "name" : "bob", "data" : "<some> <xml&g

我想在AWS Athena中使用json文件，但Athena不支持多行json

我有以下内容（其中一个值是XML）

{
“id”：10，
“姓名”：“鲍勃”，
“数据”：“\n\n”
},
{
“id”：20，
“姓名”：“简”，
“数据”：“\n\n”
}

我需要为雅典娜做以下事情

{ "id" : 10, "name" : "bob", "data" : "<some> <xml> <in here>" },
{ "id" : 20, "name" : "jane", "data" : "<other> <xml> <in here>" }

{“id”：10，“name”：“bob”，“data”：“}，
{“id”：20，“name”：“jane”，“data”：“”}

我正在使用RazorSQL从DB2导出数据，并尝试使用Python编写一些代码来“扁平化”数据，但尚未成功

谢谢大家!

使用正则表达式

import re
html = '''
{
  "id" : 10,
  "name" : "bob",
  "data" : "<some> \n <xml> \n <in here>"
},
{
  "id" : 20,
  "name" : "jane",
  "data" : "<other> \n <xml> \n <in here>"
}
'''


def replaceReg(html, regex, new):
    return re.sub(re.compile(regex), new, html)

html = replaceReg(html,' \n ',' ')
html = replaceReg(html,'{[\s]+','{ ')
html = replaceReg(html,'[\s]+}',' }')
html = replaceReg(html,',[\s]+',', ')
html = replaceReg(html,'}, ','\n')
print (html)

重新导入
html=“”
{
“id”：10，
“姓名”：“鲍勃”，
“数据”：“\n\n”
},
{
“id”：20，
“姓名”：“简”，
“数据”：“\n\n”
}
'''
def replaceReg（html、正则表达式、新）：
返回re.sub（re.compile（regex）、new、html）
html=replaceReg（html，'\n'，''）
html=replaceReg（html，'{[\s]+'，'{'）
html=replaceReg（html，[\s]+}'，'}'）
html=replaceReg（html，，[\s]+'，'，'，'）
html=replaceReg（html，'}'，'\n'）
打印（html）

结果:

{ "id" : 10, "name" : "bob", "data" : "<some> <xml> <in here>" 
{ "id" : 20, "name" : "jane", "data" : "<other> <xml> <in here>" }

{“id”：10，“name”：“bob”，“data”：”
{“id”：20，“name”：“jane”，“data”：“”}

我最后做了一些又快又脏的事情

import json
with open('data.json') as jfile:
    data = json.load(jfile)
    for d in data:
        print(json.dumps(d) + ',')

哪张照片

{'id': 200, 'name': 'bob', 'data': '<other> \n <xml> \n <data>'},
{"id": 200, "name": "bob", "data": "<other> \n <xml> \n <data>"},

{'id'：200，'name'：'bob'，'data'：'\n\n'}，
{“id”：200，“name”：“bob”，“data”：“\n\n”}，

刚刚将输出保存到另一个文件：p

它失败了，因为文件太大了，但是嘿..太近了！

在写入另一个文件时，您只需替换换行符（\n）：

s=''
with open('input.txt','r') as f_in, open('output.txt', 'w') as f_out:
    for line in f_in:        
        s += line.replace('\n', '')
    f_out.write(s)

其中input.txt包含以下数据：

{
  "id" : 10,
  "name" : "bob",
  "data" : "<some> \n <xml> \n <in here>"
},
{
  "id" : 20,
  "name" : "jane",
  "data" : "<other> \n <xml> \n <in here>"
}

{
“id”：10，
“姓名”：“鲍勃”，
“数据”：“\n\n”
},
{
“id”：20，
“姓名”：“简”，
“数据”：“\n\n”
}

这不是有效的JSON语法，也不是有意义的Python语法。这是文件中的内容吗？您能更具体地说明问题所在吗？请参阅。实际的JSON文件更像是数组

[{“prop”：“value”}，{“prop”：“value”}]

但雅典娜似乎只喜欢我示例中所示的方式。我尝试过，它在雅典娜中以这种格式工作，但不要相信我的话，因为我只是在学习。

{
  "id" : 10,
  "name" : "bob",
  "data" : "<some> \n <xml> \n <in here>"
},
{
  "id" : 20,
  "name" : "jane",
  "data" : "<other> \n <xml> \n <in here>"
}