Python 如何将BigQuery上的记录从不同的行更改为一行?
我已经从JSON文件向BigQuery中插入了值,但我的JSON文件有多个对象 例如: BigQuery上的结果是每个对象一行,其他对象为空行。如何使所有内容都包含在具有不同列的一行中Python 如何将BigQuery上的记录从不同的行更改为一行?,python,json,google-bigquery,Python,Json,Google Bigquery,我已经从JSON文件向BigQuery中插入了值,但我的JSON文件有多个对象 例如: BigQuery上的结果是每个对象一行,其他对象为空行。如何使所有内容都包含在具有不同列的一行中 def loadTable(http, service): url = "https://www.googleapis.com/upload/bigquery/v2/projects/" + projectId + "/jobs" newresource = ('--xxx\n' + 'Cont
def loadTable(http, service):
url = "https://www.googleapis.com/upload/bigquery/v2/projects/" + projectId + "/jobs"
newresource = ('--xxx\n' +
'Content-Type: application/json; charset=UTF-8\n' + '\n' +
'{\n' +
' "configuration": {\n' +
' "load": {\n' +
' "sourceFormat": "NEWLINE_DELIMITED_JSON",\n' +
' "autodetect": "' + "True" + '",\n' +
' "destinationTable": {\n' +
' "projectId": "' + projectId + '",\n' +
' "datasetId": "' + datasetId + '",\n' +
' "tableId": "' + tableId + '"\n' +
' }\n' +
' }\n' +
' }\n' +
'}\n' +
'--xxx\n' +
'Content-Type: application/octet-stream\n' +
'\n')
f = open('samplejson.json', 'r')
newresource += f.read().replace('\n', '\r\n')
newresource += ('--xxx--\n')
print newresource
headers = {'Content-Type': 'multipart/related; boundary=xxx'}
resp, content = http.request(url, method="POST", body=newresource, headers=headers)
if not resp.status == 200:
print resp
print content
else:
jsonResponse = json.loads(content)
jobReference = jsonResponse['jobReference']['jobId']
while True:
jobCollection = service.jobs()
getJob = jobCollection.get(projectId=projectId, jobId=jobReference).execute()
currentStatus = getJob['status']['state']
if 'DONE' == currentStatus:
print "Done Loading!"
return
else:
print 'Waiting to load...'
print 'Current status: ' + currentStatus
print time.ctime()
time.sleep(10)
def main(argv):
credentials = ServiceAccountCredentials.from_json_keyfile_name("samplecredentials.json")
scope = ['https://www.googleapis.com/auth/bigquery']
credentials = credentials.create_scoped(scope)
http = httplib2.Http()
http = credentials.authorize(http)
service = build('bigquery','v2', http=http)
loadTable(http, service)
我建议使用以下类型的查询BigQuery标准SQL将最终组装成一行 标准SQL 选择 数组_AGGA忽略空值作为, 数组_AGGB将空值忽略为B, 数组_AGGC将空值忽略为C 从“你的桌子” 如果您有一些额外的字段来指示哪些行要合并/分组为一个行(例如某个id),那么查询可以如下所示 标准SQL 选择 身份证件 数组_AGGA忽略空值作为, 数组_AGGB将空值忽略为B, 数组_AGGC将空值忽略为C 从“你的桌子” 按id分组
我建议使用以下类型的查询BigQuery标准SQL将最终组装成一行 标准SQL 选择 数组_AGGA忽略空值作为, 数组_AGGB将空值忽略为B, 数组_AGGC将空值忽略为C 从“你的桌子” 如果您有一些额外的字段来指示哪些行要合并/分组为一个行(例如某个id),那么查询可以如下所示 标准SQL 选择 身份证件 数组_AGGA忽略空值作为, 数组_AGGB将空值忽略为B, 数组_AGGC将空值忽略为C 从“你的桌子” 按id分组
如果你回答的话,也请考虑投票。请参阅“COASTITE”中的更多和上投票部分-如果答案帮助了你,也请考虑投票。有关更多信息,请参阅中的at和Upvote部分
def loadTable(http, service):
url = "https://www.googleapis.com/upload/bigquery/v2/projects/" + projectId + "/jobs"
newresource = ('--xxx\n' +
'Content-Type: application/json; charset=UTF-8\n' + '\n' +
'{\n' +
' "configuration": {\n' +
' "load": {\n' +
' "sourceFormat": "NEWLINE_DELIMITED_JSON",\n' +
' "autodetect": "' + "True" + '",\n' +
' "destinationTable": {\n' +
' "projectId": "' + projectId + '",\n' +
' "datasetId": "' + datasetId + '",\n' +
' "tableId": "' + tableId + '"\n' +
' }\n' +
' }\n' +
' }\n' +
'}\n' +
'--xxx\n' +
'Content-Type: application/octet-stream\n' +
'\n')
f = open('samplejson.json', 'r')
newresource += f.read().replace('\n', '\r\n')
newresource += ('--xxx--\n')
print newresource
headers = {'Content-Type': 'multipart/related; boundary=xxx'}
resp, content = http.request(url, method="POST", body=newresource, headers=headers)
if not resp.status == 200:
print resp
print content
else:
jsonResponse = json.loads(content)
jobReference = jsonResponse['jobReference']['jobId']
while True:
jobCollection = service.jobs()
getJob = jobCollection.get(projectId=projectId, jobId=jobReference).execute()
currentStatus = getJob['status']['state']
if 'DONE' == currentStatus:
print "Done Loading!"
return
else:
print 'Waiting to load...'
print 'Current status: ' + currentStatus
print time.ctime()
time.sleep(10)
def main(argv):
credentials = ServiceAccountCredentials.from_json_keyfile_name("samplecredentials.json")
scope = ['https://www.googleapis.com/auth/bigquery']
credentials = credentials.create_scoped(scope)
http = httplib2.Http()
http = credentials.authorize(http)
service = build('bigquery','v2', http=http)
loadTable(http, service)