Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/287.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何解析多个子组的XML嵌套值_Python_Xml_Pandas_Xml Parsing_Elementtree - Fatal编程技术网

Python 如何解析多个子组的XML嵌套值

Python 如何解析多个子组的XML嵌套值,python,xml,pandas,xml-parsing,elementtree,Python,Xml,Pandas,Xml Parsing,Elementtree,我有一个很大的.xml文件,它的一部分如下所示: <?xml version="1.0"?> <data> <measData> <Mesurment Id="55"> <granPeriod duration="1" endTime="2021-01-02"/> <repPeri

我有一个很大的
.xml
文件,它的一部分如下所示:

<?xml version="1.0"?>
<data>
    <measData>
        <Mesurment Id="55">
            <granPeriod duration="1" endTime="2021-01-02"/>
            <repPeriod duration="1"/>
            <measTypes>73 74 574 75 35 36 </measTypes>
            <measValue measObj="Group1">
                <measResults>512 52.733 33.5 82 0 0 </measResults>
            </measValue>
            <measValue measObj="Group2">
                <measResults>512 78.175 50 119.5 0 0 </measResults>
            </measValue>
        </Mesurment>
    </measData>
</data>
结果如下所示

哪个报道了最新的一组 我期望的结果如下所示,需要能够扩展为多组和测量Id


使用
BeautifulSoup
库可能更容易做到这一点。在使用它之前,您应该安装以下依赖项:

beautifulsoup4 = "4.9.3"
lxml = "^4.6.1"

我不喜欢在代码中添加其他包,也不喜欢在其他包中加快解析速度concern@Mohsen我更新了我的答案,这应该足够了Hanks,它在扩展数据上工作得很好,但是我对我的原始文件有问题,错误是:
it for it in mesurment if it.tag==“granpiriod”StopIteration
@Mohsen如果特定测量不存在标记,则可能发生此情况。在这种情况下,只需将所有
next
语句转换为这种格式
next((如果it.tag==“granpiriod”,则在测量中为其指定),无)
。因此,如果未找到任何标记,则返回None。然后,您将有一个if语句来检查它是否存在,以便它不会在此行失败
granpiriod.attrib[“duration”]
beautifulsoup4 = "4.9.3"
lxml = "^4.6.1"
from bs4 import BeautifulSoup, Tag

soup = BeautifulSoup("""
<?xml version="1.0"?>
<data>
    <measData>
        <Mesurment Id="55">
            <granPeriod duration="1" endTime="2021-01-02"/>
            <repPeriod duration="1"/>
            <measTypes>73 74 574 75 35 36 </measTypes>
            <measValue measObj="Group1">
                <measResults>512 52.733 33.5 82 0 0 </measResults>
            </measValue>
            <measValue measObj="Group2">
                <measResults>512 78.175 50 119.5 0 0 </measResults>
            </measValue>
        </Mesurment>
    </measData>
</data>
""", features="xml")

response = []

for tag in soup.data.measData:
    if not isinstance(tag, Tag):
        continue
    
    # please, update this dict with all the top level attributes you need
    data = {"duration": tag.granPeriod.attrs["duration"], }

    for measValue in tag:
        if not isinstance(measValue, Tag) or getattr(measValue, "measResults") is None:
            continue

        response.append({
            **data,
            "measObj": measValue.attrs["measObj"],
            "value": measValue.measResults.text
        })

print(response)

import xml.etree.ElementTree as ET
import pandas as pd

tree = ET.parse('new.xml')
root = tree.getroot()

response = []

for mesurment in tree.iter("Mesurment"):
    granPeriod = next(
        it for it in mesurment if it.tag == "granPeriod"
    )
    measTypes = next(
        it for it in mesurment if it.tag == "measTypes"
    )
    measValues = [it for it in mesurment if it.tag == "measValue"]

    mesurment_data = {
        "Mesurment": mesurment.attrib["Id"],
        "duration": granPeriod.attrib["duration"],
        "endTime": granPeriod.attrib["endTime"],
        "CounterId": measTypes.text,
    }

    for value in measValues:
        response.append({
            **mesurment_data,
            "measObj": value.attrib["measObj"],
            "value": next(
                it.text for it in value if it.tag == "measResults"
            )
        })

df2 = pd.DataFrame(response)
print(df2)