用Python从XML获取数据
我试图理解如何使用Python从XML文件中提取某些数据 目前,我正在从API中提取信息并获取XML文件,但我希望直接从XML中获取特定信息 从我所能找到的来看,元素树似乎是答案,但我发现它很难理解,而且我真的不确定它是创建解决方案的正确方法 我在下面留下了用于获取XML数据的代码,以及它提供给我的一个简短XML文件(只留下了需要提取的重要部分) 多谢各位用Python从XML获取数据,python,xml,Python,Xml,我试图理解如何使用Python从XML文件中提取某些数据 目前,我正在从API中提取信息并获取XML文件,但我希望直接从XML中获取特定信息 从我所能找到的来看,元素树似乎是答案,但我发现它很难理解,而且我真的不确定它是创建解决方案的正确方法 我在下面留下了用于获取XML数据的代码,以及它提供给我的一个简短XML文件(只留下了需要提取的重要部分) 多谢各位 import requests #Import routes routes=[] class routesClass: d
import requests
#Import routes
routes=[]
class routesClass:
def __init__(self,name,url):#,start,end,offset,rwe,al):
self.n=name
self.u=url
#self.s=start
#self.e=end
#self.o=offset
#self.r=rwe
#self.a=al
#Add example route
testRoute1=routesClass("EasternFwy-Hoddle/Johnston","https://api.tomtom.com/routing/1/calculateRoute/-37.79205923474775,145.03010268799338:-37.798883995180496,145.03040309540322:-37.807106781970354,145.02895470253526:-37.80320743019992,145.01021142594075:-37.7999012967757,144.99318476311566:?routeType=shortest&key=SECRETKEY&computeTravelTimeFor=all")
routes.append(testRoute1)
#routes.append(testRoute2)
print(routes[0].u)
还有XML之类的东西
<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>764</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:54:58+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>764</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>764</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<leg>
<summary>
<lengthInMeters>806</lengthInMeters>
<travelTimeInSeconds>67</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:43:21+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>67</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>67</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
5144
764
0
2017-12-28:14:42:14+11:00
2017-12-28:14:54:58+11:00
478
764
764
806
67
0
2017-12-28:14:42:14+11:00
2017-12-28814:43:21+11:00
59
67
67
我建议使用lxml。在我看来,浏览xml树比浏览元素树更容易。下面是如何使用该模块的示例
示例以xml为例,我将使用lxml解析它。如果将代码保存为example.xml和xmlparse.py example.xml-您提供的xml格式不正确
- 它没有将两个摘要部分分组的父xml标记 在两个汇总部分中间有一个随机的<代码> <代码>标签。
标记,并将两个摘要部分分组在
标记中。下面是XML
<parent>
<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>764</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:54:58+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>764</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>764</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<summary>
<lengthInMeters>806</lengthInMeters>
<travelTimeInSeconds>67</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:43:21+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>67</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>67</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
</parent>
输出——如果保存xmlparse.py的代码并保存example.xml文件中提供的更新后的xml,则运行脚本时将收到以下输出:
lengthInMeters => 5144
******** Do something with travelTimeInSeconds : 764
travelTimeInSeconds => 764
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:54:58+11:00
noTrafficTravelTimeInSeconds => 478
historicTrafficTravelTimeInSeconds => 764
liveTrafficIncidentsTravelTimeInSeconds => 764
lengthInMeters => 806
travelTimeInSeconds => 67
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:43:21+11:00
noTrafficTravelTimeInSeconds => 59
historicTrafficTravelTimeInSeconds => 67
liveTrafficIncidentsTravelTimeInSeconds => 67
你将如何用python编写一个脚本来获取这段代码?@MichaelHolborn我用一个工作示例更新了答案。我希望这会有所帮助。出色的工作-现在查看它并努力理解您的解决方案。节日快乐!唯一让我有点困惑的是,我想直接从HTML链接解析-因此我没有下载文件。然后我将导入urllib.request并将上面脚本中的open语句修改为类似“with urllib.request.urlopen(“”)as fobj:”。当然,这取决于您使用的url请求库,但希望这能让您了解如何打开可通过url检索的远程xml文件。
lengthInMeters => 5144
******** Do something with travelTimeInSeconds : 764
travelTimeInSeconds => 764
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:54:58+11:00
noTrafficTravelTimeInSeconds => 478
historicTrafficTravelTimeInSeconds => 764
liveTrafficIncidentsTravelTimeInSeconds => 764
lengthInMeters => 806
travelTimeInSeconds => 67
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:43:21+11:00
noTrafficTravelTimeInSeconds => 59
historicTrafficTravelTimeInSeconds => 67
liveTrafficIncidentsTravelTimeInSeconds => 67