Python 将XML转换为BigQuery的JSON可加载结构
我正在工作中学习python,需要帮助改进我的解决方案 我需要将XML数据加载到BigQuery中 我有它的工作,但不确定我是否做了一个明智的方式 我调用一个返回XML结构的API。 我使用ElementTree解析XML,并使用tree.iter()从XML返回标记和文本。 使用以下工具打印我的标签和文本:Python 将XML转换为BigQuery的JSON可加载结构,python,json,xml,google-bigquery,Python,Json,Xml,Google Bigquery,我正在工作中学习python,需要帮助改进我的解决方案 我需要将XML数据加载到BigQuery中 我有它的工作,但不确定我是否做了一个明智的方式 我调用一个返回XML结构的API。 我使用ElementTree解析XML,并使用tree.iter()从XML返回标记和文本。 使用以下工具打印我的标签和文本: for node in tree.iter(): print(f'{node.tag}, {node.text}') 返回: Tag Text Resp
for node in tree.iter():
print(f'{node.tag}, {node.text}')
返回:
Tag Text
Responses None
Response None
ResponseId 393
ResponseText Please respond “Has this loaded”
ResponseType single
ResponseStatus 0
Responses标记在每个API调用中只出现一次,但是Response到ResponseStatus是重复组,ResponseId是每个组的键。每个呼叫将返回不到100个重复组
标头中返回了一个键,Response\u key,它是所有响应ID的父项。
我的目标是获取这些数据,将其转换为JSON并流式转换为BigQuery
我需要的表结构是:
ResponseKey、ResponseID、Response、ResponseText、ResponseType、ResponseStatus
我使用的方法是
node_list = []
for node in tree.iter():
node_list.append(node.tag)
node_list.append(node.text)
[['Responses', 'None'], ['None', 'ResponseId', '393', 'ResponseText', Please respond “Has this loaded”
"', 'ResponseType', 'single', 'ResponseStatus', '0'], ['None', 'ResponseId', '394', 'ResponseText', Please confirm “Connection made” "', 'ResponseType', 'single', 'ResponseStatus', '0']]
{"GetResponses":"<Responses><Response><ResponseId>393938<\/ResponseId><ResponseText>Please respond to the following statement:\"The assigned task was easy to complete\"<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393939<\/ResponseId><ResponseText>Did you save your datafor later? Why\/why not?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>1<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393940<\/ResponseId><ResponseText>Did you notice how much it cost to find the item? How much was it?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393941<\/ResponseId><ResponseText>Did you select ‘signature on form’? Why\/why not?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>1<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393942<\/ResponseId><ResponseText>Was it easy to find thethe new page? Why\/why not?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>1<\/ResponseStatus><ExtendedType>4<\/ExtendedType><\/Response><Response><ResponseId>393943<\/ResponseId><ResponseText>Please enter your email. So that we can track your responses, we need you to provide this for each task.<\/ResponseText><ResponseShortCode>email<\/ResponseShortCode><ResponseType>text<\/ResponseType><ResponseStatus>1<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393944<\/ResponseId><ResponseText>Why didn't you save your datafor later?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393945<\/ResponseId><ResponseText>Why did you save your datafor later?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>4<\/ExtendedType><\/Response><Response><ResponseId>393946<\/ResponseId><ResponseText>Did you save your datafor later?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393947<\/ResponseId><ResponseText>Why didn't you select 'signature on form'?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393948<\/ResponseId><ResponseText>Why did you select 'signature on form'?<\/ResponseText><ResponseType>text<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>4444449<\/ResponseId><ResponseText>Did you select ‘signature on form’?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393950<\/ResponseId><ResponseText>Why wasn't it easy to find thethe new page?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>4<\/ExtendedType><\/Response><Response><ResponseId>393951<\/ResponseId><ResponseText>Was it easy to find thethe new page?<\/ResponseText><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>0<\/ExtendedType><\/Response><Response><ResponseId>393952<\/ResponseId><ResponseText>Please enter your email addressSo that we can track your responses, we need you to provide this for each task<\/ResponseText><ResponseShortCode>email<\/ResponseShortCode><ResponseType>single<\/ResponseType><ResponseStatus>0<\/ResponseStatus><ExtendedType>4<\/ExtendedType><\/Response><\/Responses>"}
我不确定所需的输出是什么, 这是一种方法
import xml.etree.ElementTree as ET
import json
p = r"d:\tmp.xml"
tree = ET.parse(p)
root = tree.getroot()
json_dict = {}
json_dict[root.tag] = root.text
json_dict['response_list'] = []
for node in root:
tmp_dict = {}
for response_info in node:
tmp_dict[response_info.tag] = response_info.text
json_dict['response_list'].append(tmp_dict)
with open(r'd:\out.json', 'w') as of:
json.dump(json_dict, of)
您可以添加一个示例XML输入和JSON输出吗?看起来您可以迭代XML并将其复制到JSON,而无需middleHi@trigonom中的所有步骤,谢谢您的关注。我已经添加了XML和JSON。谢谢你的帮助,这是一个更优雅的解决方案。为我提供了正确的数据结构,用于流式传输到BigQuery。
node_list = []
for node in tree.iter():
node_list.append(node.tag)
node_list.append(node.text)
json_format = json.dumps(node_list )
print(json_format)
["Responses", null, "Response", null, "ResponseId", "393938", "ResponseText", Please respond to the following statement:\"The assigned task was easy to complete"", "ResponseType", "single", "ResponseStatus", "0", "ExtendedType", "0"]
import xml.etree.ElementTree as ET
import json
p = r"d:\tmp.xml"
tree = ET.parse(p)
root = tree.getroot()
json_dict = {}
json_dict[root.tag] = root.text
json_dict['response_list'] = []
for node in root:
tmp_dict = {}
for response_info in node:
tmp_dict[response_info.tag] = response_info.text
json_dict['response_list'].append(tmp_dict)
with open(r'd:\out.json', 'w') as of:
json.dump(json_dict, of)