Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在python中解析具有多个根的xml数据_Python_Xml_Xml Parsing - Fatal编程技术网

在python中解析具有多个根的xml数据

在python中解析具有多个根的xml数据,python,xml,xml-parsing,Python,Xml,Xml Parsing,我正在进行一个API调用,它同样返回多个xml响应- <?xml version="1.0" encoding="UTF-8"?> <BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="BESAPI.xsd"> <Action Resourc

我正在进行一个API调用,它同样返回多个xml响应-

<?xml version="1.0" encoding="UTF-8"?>
<BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="BESAPI.xsd">
        <Action Resource="https://www.example.com">
                <Name> ABC </Name>
                <ID> 123 </ID>
        </Action>
</BESAPI>

<?xml version="1.0" encoding="UTF-8"?>
<BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="BESAPI.xsd">
        <Action Resource="https://www.example.com">
                <Name> DEF </Name>
                <ID> 456 </ID>
        </Action>
</BESAPI>

<?xml version="1.0" encoding="UTF-8"?>
<BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="BESAPI.xsd">
        <Action Resource="https://www.example.com">
                <Name> GHI </Name>
                <ID> 789 </ID>
        </Action>
</BESAPI>
但是,由于有多个根,所以我会出错。我如何解析这个


编辑:我的意思是,actionidlist似乎只包含最后一个ID,而不包含其余ID。

ET.fromstring仅解析一个XML部分,如果您试图解析整个XML部分
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
file = "filepath/<xml_file_name.xml>"
schema_path = "filepath/<xml_schame_name.xml>"
"""
"""
XSD Schema
schema_path =
<?xml version="1.0" encoding="UTF-8"?>
<BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:noNamespaceSchemaLocation="BESAPI.xsd">
        <Action Resource="https://www.example.com">
                <Name> string </Name>
                <ID> INT </ID>
        </Action>
</BESAPI>

<?xml version="1.0" encoding="UTF-8"?>
<BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:noNamespaceSchemaLocation="BESAPI.xsd">
        <Action Resource="https://www.example.com">
                <Name> string </Name>
                <ID> INT </ID>
        </Action>
</BESAPI>
"""


df_schema = sqlContext.read.format('com.databricks.spark.xml').options(rowTag='Resource').load(schema_path)
df =sqlContext.read.format('com.databricks.spark.xml').options(rowTag='Resource').load(path,schema=df_schema.schema)
#display(df)
df.createOrReplaceTempView("temptable")
structured_df =sqlContext.sql("select concat_ws(', ',Action.Name) as Name,concat_ws(', ',Action.ID) as ID from temptable")
display(structured_df)
输入数据,如果有多个根,则会出现以下错误:

xml.etree.ElementTree.ParseError: junk after document element: line 9, column 0
因此,我建议对输入数据进行预处理,将其拆分为xml列表 响应,然后依次解析每个响应:

import xml.etree.ElementTree as ET
url = ""
payload = ""
headers = {}
response = requests.post(url, headers=headers, data=payload)

# Split the input data into a list of strings (xml sections)
xml_sections = ['']
for line in response.content.splitlines():
    if len(line) != 0:
        xml_sections[-1] += line + '\n'
    else:
        xml_sections.append('')

# Parse each XML section separately
actionidlist = []
for s in xml_sections:
    root = ET.fromstring(s)
    for elem in root.iter('Action'):
        for subelem in elem.iter('ID'):
            actionidlist.append(subelem.text)
print(actionidlist)
这将产生以下输出:

[' 123 ', ' 456 ', ' 789 ']

您能在代码中显示导入和解析吗?例如,我们不知道您是否正在使用std xml模块或lxml。还有,你说我有错误,但你没有显示,是在解析阶段吗?或者在调用root.iter?时?。请包含完整的stacktraceWrap,将响应封装在单个根元素中,以使其成为格式良好的XML。@joao我已经编辑了这个问题。我会仔细阅读API说明。您正在发送多个参数吗?很难相信API会返回一个信息不充分的XML响应。它是否嵌入到更大的XML中?与维护人员联系。太好了!拆分xml响应成功了!非常感谢。
[' 123 ', ' 456 ', ' 789 ']