Python 如何提取xml文件的一部分
我有一个很大的xml文件,看起来像下面的那个。基本上,我想提取xml文件的一部分,例如有这样的Python 如何提取xml文件的一部分,python,xml,elementtree,Python,Xml,Elementtree,我有一个很大的xml文件,看起来像下面的那个。基本上,我想提取xml文件的一部分,例如有这样的“” 这意味着在解析I wnat以提取此部分后: <SubNetwork networkType = "WRAN" userLabel="AHPTUR14"> <ManagedElement sourceType = "CELLO"> <ManagedElementId string = "rbs064841"/&
“
”
这意味着在解析I wnat以提取此部分后:
<SubNetwork networkType = "WRAN" userLabel="AHPTUR14">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs064841"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
因此,通过ManagedElementId
在大xml文件中进行搜索,找到时提取找到它的部分,即从
到
。
我知道如何从xml文件中提取数据,但我不知道如何提取xml.file的一部分。我正在使用python元素树。
任何建议都会很有帮助。使用
查找和路径
,然后获取相对的父节点,如下所示:
s = '''<Model version = "1" importVersion = "12.2">
<Create>
<SubNetwork networkType = "WRAN" userLabel="AHPTUR14">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs064841"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
<SubNetwork networkType = "WRAN" userLabel = "AHPT78">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs04798"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
<SubNetwork networkType = "WRAN" userLabel = "AHPT4">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs04456"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
</Create>
</Model>'''
希望这有帮助。我使用的是python 2.6,没有安装lxml,由于管理员权限,我无法安装它。通过xml.etree还有其他方法吗?@user3319356,最后一个代码块正好用于xml
模块。让我整理一下答案,这样你就知道该怎么做了。谢谢,现在没事了。我需要解析大文件,所以我想用open(r“C:\\Users\\etihkru\\Desktop\\big.xml“,'rt')作为f:tree=ET.parse(f)来解析文件,而不是变量“s”,但这样不行吗?tree1=ET.fromstring(tree)@user3319356,请参阅我的更新。只需使用root=ET.parse(…)
,然后使用tree=root.getroot()
s = '''<Model version = "1" importVersion = "12.2">
<Create>
<SubNetwork networkType = "WRAN" userLabel="AHPTUR14">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs064841"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
<SubNetwork networkType = "WRAN" userLabel = "AHPT78">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs04798"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
<SubNetwork networkType = "WRAN" userLabel = "AHPT4">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs04456"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
</Create>
</Model>'''
# I'd prefer lxml, but you need to work on xml module...
import xml.etree.ElementTree as ET
tree = ET.fromstring(s)
# since the SubNetwork node you're interested is the parent of parent of ManagedElementId
node = tree.find('.//ManagedElementId[@string="rbs064841"]/../../../')
print ET.tostring(node)
<SubNetwork networkType="WRAN" userLabel="AHPTUR14">
<ManagedElement sourceType="CELLO">
<ManagedElementId string="rbs064841"/>
<primaryType type="RBS"/>
<managedElementType types=""/>
<associatedSite string="Site=site06484"/>
<nodeVersion string="W12B"/>
<platformVersion string="Cello 12.2"/>
<swVersion string=""/>
<vendorName string="ERICSSON"/>
<userDefinedState string=""/>
<managedServiceAvailability int="1"/>
<isManaged boolean="true"/>
<neMIMVersion string="vS.1.150"/>
<connectionStatus string="ON"/>
</ManagedElement>
</SubNetwork>
root = ET.parse('file.xml')
tree = root.getroot()
...