Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/322.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/flash/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将容器中具有嵌套子标记和缺少子标记的已解析XML转换为dataframe_Python_Xml_Python 2.7 - Fatal编程技术网

Python 将容器中具有嵌套子标记和缺少子标记的已解析XML转换为dataframe

Python 将容器中具有嵌套子标记和缺少子标记的已解析XML转换为dataframe,python,xml,python-2.7,Python,Xml,Python 2.7,我的示例XML: <RecordContainer RecordNumber = "1"> <catalog> <book id="bk101"> <person> <author>Gambardella, Matthew</author> <personal_info> <age>40</age>

我的示例XML:

<RecordContainer RecordNumber = "1">
<catalog>
   <book id="bk101">
      <person>
         <author>Gambardella, Matthew</author>
         <personal_info>
            <age>40</age>
         </personal_info> 
      </person>
      <title>XML Developer's Guide</title>
      <description>
          <price>44.95</price>
          <publish_date>2000-10-01</publish_date>
      </description>
      <details> 
          <info>this is the guide to XML</info>
      </details>
   </book>
 </catalog>
</RecordContainer>
<RecordContainer RecordNumber = "2">
 <catalog>  
   <book id="bk102">
      <person>
        <author>Ralls, Kim</author>
      </person>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <description>
        <price>5.95</price>
        <publish_date>2000-12-16</publish_date>
      </description>
   </book>
</catalog>
</RecordContainer>

马修·甘巴德拉
40
XML开发人员指南
44.95
2000-10-01
这是XML指南
拉尔斯,金
夜雨
幻想
5.95
2000-12-16
请注意,上面的XML有嵌套的子标记,一些容器中缺少一些嵌套的标记

我的预期输出是包含所有标记的dataframe,如果缺少任何标记文本,则填充null

解析数据的代码:

import xml.etree.ElementTree as ET
import pandas as pd

root = ET.fromstring("<root>"+ sample_data + "</root>")

records = []
containers = root.findall('.//RecordContainer')
for container in containers:
    entry = container.attrib
    book = container.find('.//catalog/book')
    entry.update(book.attrib)
    for child in list(book):
        entry[child.tag] = child.text
    records.append(entry)

df = pd.DataFrame(records)
将xml.etree.ElementTree作为ET导入
作为pd进口熊猫
root=ET.fromstring(“+sample\u data+”)
记录=[]
containers=root.findall('.//RecordContainer')
对于集装箱中的集装箱:
entry=container.attrib
book=container.find('.//catalog/book')
entry.update(book.attrib)
对于列表(书本)中的孩子:
条目[child.tag]=child.text
记录。追加(条目)
df=pd.DataFrame(记录)

上面的代码在缺少标记且与列名不对齐的情况下返回null

到目前为止,您尝试了什么?您肯定从中学到了什么?@mzjn是的,从上面的内容中,我能够处理早期的xml,但由于在某些容器中缺少一些嵌套标记,我无法处理it@hue请检查..@balderman你也能看一下吗