Python 如何将BigQuery上的记录从不同的行更改为一行？_Python_Json_Google Bigquery

Python 如何将BigQuery上的记录从不同的行更改为一行？

python json google-bigquery

Python 如何将BigQuery上的记录从不同的行更改为一行？,python,json,google-bigquery,Python,Json,Google Bigquery,我已经从JSON文件向BigQuery中插入了值，但我的JSON文件有多个对象例如： BigQuery上的结果是每个对象一行，其他对象为空行。如何使所有内容都包含在具有不同列的一行中 def loadTable(http, service): url = "https://www.googleapis.com/upload/bigquery/v2/projects/" + projectId + "/jobs" newresource = ('--xxx\n' + 'Cont

我已经从JSON文件向BigQuery中插入了值，但我的JSON文件有多个对象

例如：

BigQuery上的结果是每个对象一行，其他对象为空行。如何使所有内容都包含在具有不同列的一行中

def loadTable(http, service):
url = "https://www.googleapis.com/upload/bigquery/v2/projects/" + projectId + "/jobs"

newresource = ('--xxx\n' +
        'Content-Type: application/json; charset=UTF-8\n' + '\n' +
        '{\n' +
        '   "configuration": {\n' +
        '     "load": {\n' +
        '     "sourceFormat": "NEWLINE_DELIMITED_JSON",\n' +
        '     "autodetect": "' + "True" + '",\n' +
        '      "destinationTable": {\n' +
        '        "projectId": "' + projectId + '",\n' +
        '        "datasetId": "' + datasetId + '",\n' +
        '        "tableId": "' + tableId + '"\n' +
        '      }\n' +
        '    }\n' +
        '  }\n' +
        '}\n' +
        '--xxx\n' +
        'Content-Type: application/octet-stream\n' +
        '\n')

f = open('samplejson.json', 'r')
newresource += f.read().replace('\n', '\r\n')

newresource += ('--xxx--\n')
print newresource

headers = {'Content-Type': 'multipart/related; boundary=xxx'}
resp, content = http.request(url, method="POST", body=newresource, headers=headers)

if not resp.status == 200:
    print resp
    print content
else:
    jsonResponse = json.loads(content)
    jobReference = jsonResponse['jobReference']['jobId']

while True:
    jobCollection = service.jobs()
    getJob = jobCollection.get(projectId=projectId, jobId=jobReference).execute()
    currentStatus = getJob['status']['state']

if 'DONE' == currentStatus:
    print "Done Loading!"
    return

else:
    print 'Waiting to load...'
    print 'Current status: ' + currentStatus
    print time.ctime()
    time.sleep(10)


def main(argv):
credentials = ServiceAccountCredentials.from_json_keyfile_name("samplecredentials.json")
scope = ['https://www.googleapis.com/auth/bigquery']
credentials = credentials.create_scoped(scope)

http = httplib2.Http()
http = credentials.authorize(http)

service = build('bigquery','v2', http=http)

loadTable(http, service)

我建议使用以下类型的查询BigQuery标准SQL将最终组装成一行

标准SQL 选择数组_AGGA忽略空值作为，数组_AGGB将空值忽略为B，数组_AGGC将空值忽略为C 从“你的桌子” 如果您有一些额外的字段来指示哪些行要合并/分组为一个行（例如某个id），那么查询可以如下所示

标准SQL 选择身份证件数组_AGGA忽略空值作为，数组_AGGB将空值忽略为B，数组_AGGC将空值忽略为C 从“你的桌子” 按id分组

我建议使用以下类型的查询BigQuery标准SQL将最终组装成一行

标准SQL 选择身份证件数组_AGGA忽略空值作为，数组_AGGB将空值忽略为B，数组_AGGC将空值忽略为C 从“你的桌子” 按id分组

如果你回答的话，也请考虑投票。请参阅“COASTITE”中的更多和上投票部分-如果答案帮助了你，也请考虑投票。有关更多信息，请参阅中的at和Upvote部分

def loadTable(http, service):
url = "https://www.googleapis.com/upload/bigquery/v2/projects/" + projectId + "/jobs"

newresource = ('--xxx\n' +
        'Content-Type: application/json; charset=UTF-8\n' + '\n' +
        '{\n' +
        '   "configuration": {\n' +
        '     "load": {\n' +
        '     "sourceFormat": "NEWLINE_DELIMITED_JSON",\n' +
        '     "autodetect": "' + "True" + '",\n' +
        '      "destinationTable": {\n' +
        '        "projectId": "' + projectId + '",\n' +
        '        "datasetId": "' + datasetId + '",\n' +
        '        "tableId": "' + tableId + '"\n' +
        '      }\n' +
        '    }\n' +
        '  }\n' +
        '}\n' +
        '--xxx\n' +
        'Content-Type: application/octet-stream\n' +
        '\n')

f = open('samplejson.json', 'r')
newresource += f.read().replace('\n', '\r\n')

newresource += ('--xxx--\n')
print newresource

headers = {'Content-Type': 'multipart/related; boundary=xxx'}
resp, content = http.request(url, method="POST", body=newresource, headers=headers)

if not resp.status == 200:
    print resp
    print content
else:
    jsonResponse = json.loads(content)
    jobReference = jsonResponse['jobReference']['jobId']

while True:
    jobCollection = service.jobs()
    getJob = jobCollection.get(projectId=projectId, jobId=jobReference).execute()
    currentStatus = getJob['status']['state']

if 'DONE' == currentStatus:
    print "Done Loading!"
    return

else:
    print 'Waiting to load...'
    print 'Current status: ' + currentStatus
    print time.ctime()
    time.sleep(10)


def main(argv):
credentials = ServiceAccountCredentials.from_json_keyfile_name("samplecredentials.json")
scope = ['https://www.googleapis.com/auth/bigquery']
credentials = credentials.create_scoped(scope)

http = httplib2.Http()
http = credentials.authorize(http)

service = build('bigquery','v2', http=http)

loadTable(http, service)