Python 如何解析多个子组的XML嵌套值
我有一个很大的Python 如何解析多个子组的XML嵌套值,python,xml,pandas,xml-parsing,elementtree,Python,Xml,Pandas,Xml Parsing,Elementtree,我有一个很大的.xml文件,它的一部分如下所示: <?xml version="1.0"?> <data> <measData> <Mesurment Id="55"> <granPeriod duration="1" endTime="2021-01-02"/> <repPeri
.xml
文件,它的一部分如下所示:
<?xml version="1.0"?>
<data>
<measData>
<Mesurment Id="55">
<granPeriod duration="1" endTime="2021-01-02"/>
<repPeriod duration="1"/>
<measTypes>73 74 574 75 35 36 </measTypes>
<measValue measObj="Group1">
<measResults>512 52.733 33.5 82 0 0 </measResults>
</measValue>
<measValue measObj="Group2">
<measResults>512 78.175 50 119.5 0 0 </measResults>
</measValue>
</Mesurment>
</measData>
</data>
结果如下所示
哪个报道了最新的一组
我期望的结果如下所示,需要能够扩展为多组和测量Id
使用
BeautifulSoup
库可能更容易做到这一点。在使用它之前,您应该安装以下依赖项:
beautifulsoup4 = "4.9.3"
lxml = "^4.6.1"
我不喜欢在代码中添加其他包,也不喜欢在其他包中加快解析速度concern@Mohsen我更新了我的答案,这应该足够了Hanks,它在扩展数据上工作得很好,但是我对我的原始文件有问题,错误是:
it for it in mesurment if it.tag==“granpiriod”StopIteration
@Mohsen如果特定测量不存在标记,则可能发生此情况。在这种情况下,只需将所有next
语句转换为这种格式next((如果it.tag==“granpiriod”,则在测量中为其指定),无)
。因此,如果未找到任何标记,则返回None。然后,您将有一个if语句来检查它是否存在,以便它不会在此行失败granpiriod.attrib[“duration”]
beautifulsoup4 = "4.9.3"
lxml = "^4.6.1"
from bs4 import BeautifulSoup, Tag
soup = BeautifulSoup("""
<?xml version="1.0"?>
<data>
<measData>
<Mesurment Id="55">
<granPeriod duration="1" endTime="2021-01-02"/>
<repPeriod duration="1"/>
<measTypes>73 74 574 75 35 36 </measTypes>
<measValue measObj="Group1">
<measResults>512 52.733 33.5 82 0 0 </measResults>
</measValue>
<measValue measObj="Group2">
<measResults>512 78.175 50 119.5 0 0 </measResults>
</measValue>
</Mesurment>
</measData>
</data>
""", features="xml")
response = []
for tag in soup.data.measData:
if not isinstance(tag, Tag):
continue
# please, update this dict with all the top level attributes you need
data = {"duration": tag.granPeriod.attrs["duration"], }
for measValue in tag:
if not isinstance(measValue, Tag) or getattr(measValue, "measResults") is None:
continue
response.append({
**data,
"measObj": measValue.attrs["measObj"],
"value": measValue.measResults.text
})
print(response)
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('new.xml')
root = tree.getroot()
response = []
for mesurment in tree.iter("Mesurment"):
granPeriod = next(
it for it in mesurment if it.tag == "granPeriod"
)
measTypes = next(
it for it in mesurment if it.tag == "measTypes"
)
measValues = [it for it in mesurment if it.tag == "measValue"]
mesurment_data = {
"Mesurment": mesurment.attrib["Id"],
"duration": granPeriod.attrib["duration"],
"endTime": granPeriod.attrib["endTime"],
"CounterId": measTypes.text,
}
for value in measValues:
response.append({
**mesurment_data,
"measObj": value.attrib["measObj"],
"value": next(
it.text for it in value if it.tag == "measResults"
)
})
df2 = pd.DataFrame(response)
print(df2)