Python 将XML转换为BigQuery的JSON可加载结构_Python_Json_Xml_Google Bigquery

Python 将XML转换为BigQuery的JSON可加载结构

python json xml google-bigquery

Python 将XML转换为BigQuery的JSON可加载结构,python,json,xml,google-bigquery,Python,Json,Xml,Google Bigquery,我正在工作中学习python，需要帮助改进我的解决方案我需要将XML数据加载到BigQuery中我有它的工作，但不确定我是否做了一个明智的方式我调用一个返回XML结构的API。我使用ElementTree解析XML，并使用tree.iter（）从XML返回标记和文本。使用以下工具打印我的标签和文本： for node in tree.iter(): print(f'{node.tag}, {node.text}') 返回： Tag Text Resp

我正在工作中学习python，需要帮助改进我的解决方案

我需要将XML数据加载到BigQuery中

我有它的工作，但不确定我是否做了一个明智的方式

我调用一个返回XML结构的API。我使用ElementTree解析XML，并使用tree.iter（）从XML返回标记和文本。使用以下工具打印我的标签和文本：

for node in tree.iter():
    print(f'{node.tag}, {node.text}')

Tag              Text
Responses        None
Response         None
ResponseId       393
ResponseText     Please respond “Has this loaded” 
ResponseType     single
ResponseStatus   0

Responses标记在每个API调用中只出现一次，但是Response到ResponseStatus是重复组，ResponseId是每个组的键。每个呼叫将返回不到100个重复组

标头中返回了一个键，Response\u key，它是所有响应ID的父项。我的目标是获取这些数据，将其转换为JSON并流式转换为BigQuery
我需要的表结构是：
ResponseKey、ResponseID、Response、ResponseText、ResponseType、ResponseStatus
我使用的方法是

使用tree.iter（）循环并创建列表

node_list = [] for node in tree.iter(): node_list.append(node.tag) node_list.append(node.text)

使用itertools对列表进行分组（我发现这是一个困难的步骤）

返回：

[['Responses', 'None'], ['None', 'ResponseId', '393', 'ResponseText', Please respond “Has this loaded” "', 'ResponseType', 'single', 'ResponseStatus', '0'], ['None', 'ResponseId', '394', 'ResponseText', Please confirm “Connection made” "', 'ResponseType', 'single', 'ResponseStatus', '0']]

加载到Pandas数据框中，删除任何双引号，以防导致BigQuery出现任何问题

将ResponseKey作为列添加到数据帧

将数据帧转换为JSON并传递到从\u JSON加载\u表\u
它是有效的，但不确定它是否明智
如有任何改进建议，将不胜感激
以下是XML的一个示例：

{"GetResponses":"<Responses><Response><ResponseId>393938<\/ResponseId><ResponseText>Please respond to the following statement:\"The assigned task was easy to complete\"<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393939<\/ResponseId><ResponseText>Did you save your datafor later? Why\/why not?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>1<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393940<\/ResponseId><ResponseText>Did you notice how much it cost to find the item? How much was it?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393941<\/ResponseId><ResponseText>Did you select ‘signature on form’? Why\/why not?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>1<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393942<\/ResponseId><ResponseText>Was it easy to find thethe new page? Why\/why not?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>1<\/ResponseStatus><ExtendedType>4<\/ExtendedType><\/Response><Response><ResponseId>393943<\/ResponseId><ResponseText>Please enter your email. So that we can track your responses, we need you to provide this for each task.<\/ResponseText><ResponseShortCode>email<\/ResponseShortCode><ResponseType>text<\/ResponseType><ResponseStatus>1<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393944<\/ResponseId><ResponseText>Why didn't you save your datafor later?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393945<\/ResponseId><ResponseText>Why did you save your datafor later?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>4<\/ExtendedType><\/Response><Response><ResponseId>393946<\/ResponseId><ResponseText>Did you save your datafor later?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393947<\/ResponseId><ResponseText>Why didn't you select 'signature on form'?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393948<\/ResponseId><ResponseText>Why did you select 'signature on form'?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>4444449<\/ResponseId><ResponseText>Did you select ‘signature on form’?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393950<\/ResponseId><ResponseText>Why wasn't it easy to find thethe new page?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>4<\/ExtendedType><\/Response><Response><ResponseId>393951<\/ResponseId><ResponseText>Was it easy to find thethe new page?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393952<\/ResponseId><ResponseText>Please enter your email addressSo that we can track your responses, we need you to provide this for each task<\/ResponseText><ResponseShortCode>email<\/ResponseShortCode><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>4<\/ExtendedType><\/Response><\/Responses>"}

我不确定所需的输出是什么，这是一种方法

import xml.etree.ElementTree as ET import json p = r"d:\tmp.xml" tree = ET.parse(p) root = tree.getroot() json_dict = {} json_dict[root.tag] = root.text json_dict['response_list'] = [] for node in root: tmp_dict = {} for response_info in node: tmp_dict[response_info.tag] = response_info.text json_dict['response_list'].append(tmp_dict) with open(r'd:\out.json', 'w') as of: json.dump(json_dict, of)

您可以添加一个示例XML输入和JSON输出吗？看起来您可以迭代XML并将其复制到JSON，而无需middleHi@trigonom中的所有步骤，谢谢您的关注。我已经添加了XML和JSON。谢谢你的帮助，这是一个更优雅的解决方案。为我提供了正确的数据结构，用于流式传输到BigQuery。
node_list = [] for node in tree.iter(): node_list.append(node.tag) node_list.append(node.text) json_format = json.dumps(node_list ) print(json_format) ["Responses", null, "Response", null, "ResponseId", "393938", "ResponseText", Please respond to the following statement:\"The assigned task was easy to complete"", "ResponseType", "single", "ResponseStatus", "0", "ExtendedType", "0"]

import xml.etree.ElementTree as ET import json p = r"d:\tmp.xml" tree = ET.parse(p) root = tree.getroot() json_dict = {} json_dict[root.tag] = root.text json_dict['response_list'] = [] for node in root: tmp_dict = {} for response_info in node: tmp_dict[response_info.tag] = response_info.text json_dict['response_list'].append(tmp_dict) with open(r'd:\out.json', 'w') as of: json.dump(json_dict, of)