Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/280.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/68.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何将AWS Athena的多记录多行JSON转换为单记录行JSON?_Python_Sql_Json_Aws Lambda_Amazon Athena - Fatal编程技术网

Python 如何将AWS Athena的多记录多行JSON转换为单记录行JSON?

Python 如何将AWS Athena的多记录多行JSON转换为单记录行JSON?,python,sql,json,aws-lambda,amazon-athena,Python,Sql,Json,Aws Lambda,Amazon Athena,我想在AWS Athena中使用json文件,但Athena不支持多行json 我有以下内容(其中一个值是XML) { “id”:10, “姓名”:“鲍勃”, “数据”:“\n\n” }, { “id”:20, “姓名”:“简”, “数据”:“\n\n” } 我需要为雅典娜做以下事情 { "id" : 10, "name" : "bob", "data" : "<some> <xml&g

我想在AWS Athena中使用json文件,但Athena不支持多行json

我有以下内容(其中一个值是XML)

{
“id”:10,
“姓名”:“鲍勃”,
“数据”:“\n\n”
},
{
“id”:20,
“姓名”:“简”,
“数据”:“\n\n”
}
我需要为雅典娜做以下事情

{ "id" : 10, "name" : "bob", "data" : "<some> <xml> <in here>" },
{ "id" : 20, "name" : "jane", "data" : "<other> <xml> <in here>" }
{“id”:10,“name”:“bob”,“data”:“},
{“id”:20,“name”:“jane”,“data”:“”}
我正在使用RazorSQL从DB2导出数据,并尝试使用Python编写一些代码来“扁平化”数据,但尚未成功

谢谢大家!

使用正则表达式

import re
html = '''
{
  "id" : 10,
  "name" : "bob",
  "data" : "<some> \n <xml> \n <in here>"
},
{
  "id" : 20,
  "name" : "jane",
  "data" : "<other> \n <xml> \n <in here>"
}
'''


def replaceReg(html, regex, new):
    return re.sub(re.compile(regex), new, html)

html = replaceReg(html,' \n ',' ')
html = replaceReg(html,'{[\s]+','{ ')
html = replaceReg(html,'[\s]+}',' }')
html = replaceReg(html,',[\s]+',', ')
html = replaceReg(html,'}, ','\n')
print (html)
重新导入
html=“”
{
“id”:10,
“姓名”:“鲍勃”,
“数据”:“\n\n”
},
{
“id”:20,
“姓名”:“简”,
“数据”:“\n\n”
}
'''
def replaceReg(html、正则表达式、新):
返回re.sub(re.compile(regex)、new、html)
html=replaceReg(html,'\n','')
html=replaceReg(html,'{[\s]+','{')
html=replaceReg(html,[\s]+}','}')
html=replaceReg(html,,[\s]+',',',')
html=replaceReg(html,'}','\n')
打印(html)
结果:

{ "id" : 10, "name" : "bob", "data" : "<some> <xml> <in here>" 
{ "id" : 20, "name" : "jane", "data" : "<other> <xml> <in here>" }
{“id”:10,“name”:“bob”,“data”:”
{“id”:20,“name”:“jane”,“data”:“”}

我最后做了一些又快又脏的事情

import json
with open('data.json') as jfile:
    data = json.load(jfile)
    for d in data:
        print(json.dumps(d) + ',')
哪张照片

{'id': 200, 'name': 'bob', 'data': '<other> \n <xml> \n <data>'},
{"id": 200, "name": "bob", "data": "<other> \n <xml> \n <data>"},
{'id':200,'name':'bob','data':'\n\n'},
{“id”:200,“name”:“bob”,“data”:“\n\n”},
刚刚将输出保存到另一个文件:p


它失败了,因为文件太大了,但是嘿..太近了!

在写入另一个文件时,您只需替换换行符(\n):

s=''
with open('input.txt','r') as f_in, open('output.txt', 'w') as f_out:
    for line in f_in:        
        s += line.replace('\n', '')
    f_out.write(s)
其中input.txt包含以下数据:

{
  "id" : 10,
  "name" : "bob",
  "data" : "<some> \n <xml> \n <in here>"
},
{
  "id" : 20,
  "name" : "jane",
  "data" : "<other> \n <xml> \n <in here>"
}
{
“id”:10,
“姓名”:“鲍勃”,
“数据”:“\n\n”
},
{
“id”:20,
“姓名”:“简”,
“数据”:“\n\n”
}

这不是有效的JSON语法,也不是有意义的Python语法。这是文件中的内容吗?您能更具体地说明问题所在吗?请参阅。实际的JSON文件更像是数组
[{“prop”:“value”},{“prop”:“value”}]
但雅典娜似乎只喜欢我示例中所示的方式。我尝试过,它在雅典娜中以这种格式工作,但不要相信我的话,因为我只是在学习。
{
  "id" : 10,
  "name" : "bob",
  "data" : "<some> \n <xml> \n <in here>"
},
{
  "id" : 20,
  "name" : "jane",
  "data" : "<other> \n <xml> \n <in here>"
}