Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/287.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用Python从XML获取数据_Python_Xml - Fatal编程技术网

用Python从XML获取数据

用Python从XML获取数据,python,xml,Python,Xml,我试图理解如何使用Python从XML文件中提取某些数据 目前,我正在从API中提取信息并获取XML文件,但我希望直接从XML中获取特定信息 从我所能找到的来看,元素树似乎是答案,但我发现它很难理解,而且我真的不确定它是创建解决方案的正确方法 我在下面留下了用于获取XML数据的代码,以及它提供给我的一个简短XML文件(只留下了需要提取的重要部分) 多谢各位 import requests #Import routes routes=[] class routesClass: d

我试图理解如何使用Python从XML文件中提取某些数据

目前,我正在从API中提取信息并获取XML文件,但我希望直接从XML中获取特定信息

从我所能找到的来看,元素树似乎是答案,但我发现它很难理解,而且我真的不确定它是创建解决方案的正确方法

我在下面留下了用于获取XML数据的代码,以及它提供给我的一个简短XML文件(只留下了需要提取的重要部分)

多谢各位

import requests


#Import routes
routes=[]



class routesClass:
    def __init__(self,name,url):#,start,end,offset,rwe,al):
        self.n=name
        self.u=url
        #self.s=start
        #self.e=end
        #self.o=offset
        #self.r=rwe
        #self.a=al

#Add example route
testRoute1=routesClass("EasternFwy-Hoddle/Johnston","https://api.tomtom.com/routing/1/calculateRoute/-37.79205923474775,145.03010268799338:-37.798883995180496,145.03040309540322:-37.807106781970354,145.02895470253526:-37.80320743019992,145.01021142594075:-37.7999012967757,144.99318476311566:?routeType=shortest&key=SECRETKEY&computeTravelTimeFor=all")
routes.append(testRoute1)
#routes.append(testRoute2)

print(routes[0].u)
还有XML之类的东西

<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>764</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:54:58+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>764</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>764</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<leg>
<summary>
<lengthInMeters>806</lengthInMeters>
<travelTimeInSeconds>67</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:43:21+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>67</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>67</liveTrafficIncidentsTravelTimeInSeconds>
</summary>

5144
764
0
2017-12-28:14:42:14+11:00
2017-12-28:14:54:58+11:00
478
764
764
806
67
0
2017-12-28:14:42:14+11:00
2017-12-28814:43:21+11:00
59
67
67

我建议使用lxml。在我看来,浏览xml树比浏览元素树更容易。下面是如何使用该模块的示例

示例
以xml为例,我将使用lxml解析它。如果将代码保存为example.xml和xmlparse.py

example.xml-您提供的xml格式不正确

  • 它没有将两个摘要部分分组的父xml标记
  • 在两个汇总部分中间有一个随机的<代码> <代码>标签。
这两个问题不允许对其进行解析,因此我删除了
标记,并将两个摘要部分分组在
标记中。下面是XML

<parent>
    <summary>
        <lengthInMeters>5144</lengthInMeters>
        <travelTimeInSeconds>764</travelTimeInSeconds>
        <trafficDelayInSeconds>0</trafficDelayInSeconds>
        <departureTime>2017-12-28T14:42:14+11:00</departureTime>
        <arrivalTime>2017-12-28T14:54:58+11:00</arrivalTime>
        <noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
        <historicTrafficTravelTimeInSeconds>764</historicTrafficTravelTimeInSeconds>
        <liveTrafficIncidentsTravelTimeInSeconds>764</liveTrafficIncidentsTravelTimeInSeconds>
    </summary>
    <summary>
        <lengthInMeters>806</lengthInMeters>
        <travelTimeInSeconds>67</travelTimeInSeconds>
        <trafficDelayInSeconds>0</trafficDelayInSeconds>
        <departureTime>2017-12-28T14:42:14+11:00</departureTime>
        <arrivalTime>2017-12-28T14:43:21+11:00</arrivalTime>
        <noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
        <historicTrafficTravelTimeInSeconds>67</historicTrafficTravelTimeInSeconds>
        <liveTrafficIncidentsTravelTimeInSeconds>67</liveTrafficIncidentsTravelTimeInSeconds>
    </summary>
</parent>
输出——如果保存xmlparse.py的代码并保存example.xml文件中提供的更新后的xml,则运行脚本时将收到以下输出:

lengthInMeters => 5144
******** Do something with  travelTimeInSeconds  :  764
travelTimeInSeconds => 764
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:54:58+11:00
noTrafficTravelTimeInSeconds => 478
historicTrafficTravelTimeInSeconds => 764
liveTrafficIncidentsTravelTimeInSeconds => 764
lengthInMeters => 806
travelTimeInSeconds => 67
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:43:21+11:00
noTrafficTravelTimeInSeconds => 59
historicTrafficTravelTimeInSeconds => 67
liveTrafficIncidentsTravelTimeInSeconds => 67

你将如何用python编写一个脚本来获取这段代码?@MichaelHolborn我用一个工作示例更新了答案。我希望这会有所帮助。出色的工作-现在查看它并努力理解您的解决方案。节日快乐!唯一让我有点困惑的是,我想直接从HTML链接解析-因此我没有下载文件。然后我将导入urllib.request并将上面脚本中的open语句修改为类似“with urllib.request.urlopen(“”)as fobj:”。当然,这取决于您使用的url请求库,但希望这能让您了解如何打开可通过url检索的远程xml文件。
lengthInMeters => 5144
******** Do something with  travelTimeInSeconds  :  764
travelTimeInSeconds => 764
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:54:58+11:00
noTrafficTravelTimeInSeconds => 478
historicTrafficTravelTimeInSeconds => 764
liveTrafficIncidentsTravelTimeInSeconds => 764
lengthInMeters => 806
travelTimeInSeconds => 67
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:43:21+11:00
noTrafficTravelTimeInSeconds => 59
historicTrafficTravelTimeInSeconds => 67
liveTrafficIncidentsTravelTimeInSeconds => 67