将复杂XML文件转换为dataframe/CSV-Python 我现在正在把一个复杂的XML文件转换成CSV或熊猫DF。我对xml数据格式没有任何经验，我在网上找到的所有代码建议都不适合我。谁能帮我一下吗_Python_Xml_Pandas

将复杂XML文件转换为dataframe/CSV-Python 我现在正在把一个复杂的XML文件转换成CSV或熊猫DF。我对xml数据格式没有任何经验，我在网上找到的所有代码建议都不适合我。谁能帮我一下吗

python xml pandas

将复杂XML文件转换为dataframe/CSV-Python 我现在正在把一个复杂的XML文件转换成CSV或熊猫DF。我对xml数据格式没有任何经验，我在网上找到的所有代码建议都不适合我。谁能帮我一下吗,python,xml,pandas,Python,Xml,Pandas,数据中有很多我不需要的元素，所以这里不包括这些元素出于隐私原因，我不会在这里上传原始数据，但我会分享结构的外观 <RefData> <Attributes> <Id>1011</Id> <FullName>xxxx</FullName> <ShortName>xx</ShortName> <Country>UK</Country>

数据中有很多我不需要的元素，所以这里不包括这些元素

出于隐私原因，我不会在这里上传原始数据，但我会分享结构的外观

<RefData>
  <Attributes>
    <Id>1011</Id>
    <FullName>xxxx</FullName>
    <ShortName>xx</ShortName>
    <Country>UK</Country>
    <Currency>GBP</Currency>
  </Attributes>
  <PolicyID>000</PolicyID>
  <TradeDetails>
    <UniqueTradeId>000</UniqueTradeId>
    <Booking>UK</Booking>
    <Date>12/2/2019</Date>
    </TradeDetails>
</RefData>
<RefData>
  <Attributes>
    <Id>1012</Id>
    <FullName>xxx2</FullName>
    <ShortName>x2</ShortName>
    <Country>UK</Country>
    <Currency>GBP</Currency>
  </Attributes>
  <PolicyID>002</PolicyID>
  <TradeDetails>
    <UniqueTradeId>0022</UniqueTradeId>
    <Booking>UK</Booking>
    <Date>12/3/2019</Date>
    </TradeDetails>
</RefData>



我将真诚地感谢在这方面我能得到的任何帮助。感谢mil。关于输入XML文件的一个更正：它必须包含
一个单个主元素（任何名称），其中包含您的RefData
元素
因此，输入文件实际上包含：
<Main>
  <RefData>
    ...
  </RefData>
  <RefData>
    ...
  </RefData>
</Main>

然后我注意到实际上不需要整个解析的XML树，
因此，通常采用的方案是：

解析每个元素后，立即读取其内容
将任何子元素的内容（文本）保存在任何中间
数据结构（我选择了一个字典列表）
删除源XML元素（不再需要）
在读取循环之后，从上面创建结果数据帧
中间数据结构

因此，我的代码如下所示：
rows = []
for _, elem in et.iterparse('RefData.xml', tag='RefData'):
    rows.append({'id':   elem.findtext('Attributes/Id'),
        'fullname':      elem.findtext('Attributes/FullName'),
        'shortname':     elem.findtext('Attributes/ShortName'),
        'country':       elem.findtext('Attributes/Country'),
        'currency':      elem.findtext('Attributes/Currency'),
        'Policy ID':     elem.findtext('PolicyID'),
        'UniqueTradeId': elem.findtext('TradeDetails/UniqueTradeId'),
        'Booking':       elem.findtext('TradeDetails/Booking'),
        'Date':          elem.findtext('TradeDetails/Date')
    })
    elem.clear()
    elem.getparent().remove(elem)
df = pd.DataFrame(rows)

要完全理解细节，请在Web上搜索lxml和
使用的每种方法
对于您的示例数据，结果是：
     id fullname shortname country currency Policy ID UniqueTradeId Booking      Date
0  1011     xxxx        xx      UK      GBP       000           000      UK 12/2/2019 
1  1012     xxx2        x2      UK      GBP       002          0022      UK 12/3/2019

可能要执行的最后一步是将上述数据帧保存在CSV中
文件，但我想您知道怎么做。关于输入XML文件的一个更正：它必须包含
一个单个主元素（任何名称），其中包含您的RefData
元素
因此，输入文件实际上包含：
<Main>
  <RefData>
    ...
  </RefData>
  <RefData>
    ...
  </RefData>
</Main>

然后我注意到实际上不需要整个解析的XML树，
因此，通常采用的方案是：

解析每个元素后，立即读取其内容
将任何子元素的内容（文本）保存在任何中间
数据结构（我选择了一个字典列表）
删除源XML元素（不再需要）
在读取循环之后，从上面创建结果数据帧
中间数据结构

因此，我的代码如下所示：
rows = []
for _, elem in et.iterparse('RefData.xml', tag='RefData'):
    rows.append({'id':   elem.findtext('Attributes/Id'),
        'fullname':      elem.findtext('Attributes/FullName'),
        'shortname':     elem.findtext('Attributes/ShortName'),
        'country':       elem.findtext('Attributes/Country'),
        'currency':      elem.findtext('Attributes/Currency'),
        'Policy ID':     elem.findtext('PolicyID'),
        'UniqueTradeId': elem.findtext('TradeDetails/UniqueTradeId'),
        'Booking':       elem.findtext('TradeDetails/Booking'),
        'Date':          elem.findtext('TradeDetails/Date')
    })
    elem.clear()
    elem.getparent().remove(elem)
df = pd.DataFrame(rows)

要完全理解细节，请在Web上搜索lxml和
使用的每种方法
对于您的示例数据，结果是：
     id fullname shortname country currency Policy ID UniqueTradeId Booking      Date
0  1011     xxxx        xx      UK      GBP       000           000      UK 12/2/2019 
1  1012     xxx2        x2      UK      GBP       002          0022      UK 12/3/2019

可能要执行的最后一步是将上述数据帧保存在CSV中
文件，但我想您知道如何操作。
另一种方法是使用lxml和xpath：
   from lxml import etree
   dat = """[your FIXED xml]"""
   doc = etree.fromstring(dat)
   columns = []
   rows = []
   to_delete = ["TradeDetails",'Attributes']
   body = doc.xpath('.//RefData')
   for el in body[0].xpath('.//*'):
      columns.append(el.tag)

   for b in body:    
        items = b.xpath('.//*')
        row = []
        for item in items:
           if item.tag not in to_delete:
               row.append(item.text)
        rows.append(row)
   for col in to_delete:
      if col in columns:
         columns.remove(col)

    pd.DataFrame(rows,columns=columns)

输出是问题中指出的数据帧。
另一种方法是使用lxml和xpath：
   from lxml import etree
   dat = """[your FIXED xml]"""
   doc = etree.fromstring(dat)
   columns = []
   rows = []
   to_delete = ["TradeDetails",'Attributes']
   body = doc.xpath('.//RefData')
   for el in body[0].xpath('.//*'):
      columns.append(el.tag)

   for b in body:    
        items = b.xpath('.//*')
        row = []
        for item in items:
           if item.tag not in to_delete:
               row.append(item.text)
        rows.append(row)
   for col in to_delete:
      if col in columns:
         columns.remove(col)

    pd.DataFrame(rows,columns=columns)

输出是您问题中指出的数据帧。
您好，感谢您的分享，但我在两天前尝试了此功能，但对我无效。这篇文章中使用的xml文件的结构与我的完全不同。嗨，谢谢分享，不过我两天前试过了，但对我来说不起作用。文章中使用的xml文件的结构与我的完全不同。