Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/365.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在Python中从XML/SOAP提取数据_Python_Xml_Xpath_Soap - Fatal编程技术网

如何在Python中从XML/SOAP提取数据

如何在Python中从XML/SOAP提取数据,python,xml,xpath,soap,Python,Xml,Xpath,Soap,英国国家天然气系统(National Gas system)发布了大量数据,可以从SOAP服务器访问这些数据,下面显示了一个返回数据的示例(用于液化天然气)。我已经编写了生成请求和处理响应的代码,但在如何提取返回的信息方面遇到了麻烦。目的是将数据上传到后端数据库或数据帧中 在前面的代码中,我只是使用XPATH遍历XML,然后迭代标记并提取子数据。因此,我希望提取: GetPublicationDataWMResult, ApplicableAt, ApplicableFor, Value, ..

英国国家天然气系统(National Gas system)发布了大量数据,可以从SOAP服务器访问这些数据,下面显示了一个返回数据的示例(用于液化天然气)。我已经编写了生成请求和处理响应的代码,但在如何提取返回的信息方面遇到了麻烦。目的是将数据上传到后端数据库或数据帧中

在前面的代码中,我只是使用XPATH遍历XML,然后迭代标记并提取子数据。因此,我希望提取:

GetPublicationDataWMResult, ApplicableAt, ApplicableFor, Value, ...
LNG Stock Level,2016-03-13T15:00:07Z, 2016-03-12T00:00:00Z, 7050.42286, ...
LNG Capacity,2016-03-13T15:00:07Z, 2016-03-12T00:00:00Z, 6515042480, ...
尝试使用XPATH遍历子项(/Envelope/Body/GetPublicationDataWMResponse/GetPublicationDataWMResult/)失败

如果我通过添加一系列字符串删除来清理代码,那么逻辑就可以工作,但这是次优的,将来肯定会中断

示例代码:

import requests
from lxml import objectify

def getXML():

    toDate = "2016-03-12"
    fromDate = "2016-03-12"
    dateType = "gasday"

    url="http://marketinformation.natgrid.co.uk/MIPIws-public/public/publicwebservice.asmx"
    headers = {'content-type': 'application/soap+xml; charset=utf-8'}

    body ="""<soap12:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://www.w3.org/2003/05/soap-envelope">
        <soap12:Body>
            <GetPublicationDataWM xmlns="http://www.NationalGrid.com/MIPI/">
                <reqObject>
                    <LatestFlag>Y</LatestFlag>
                    <ApplicableForFlag>Y</ApplicableForFlag>
                    <ToDate>%s</ToDate>
                    <FromDate>%s</FromDate>
                    <DateType>%s</DateType>
                    <PublicationObjectNameList>
                        <string>LNG Stock Level</string>
                        <string>LNG, Daily Aggregated Available Capacity, D+1</string>
                    </PublicationObjectNameList>
                </reqObject>
            </GetPublicationDataWM>
        </soap12:Body>
    </soap12:Envelope>""" % (toDate, fromDate,dateType)


    response = requests.post(url,data=body,headers=headers)

    return response.content

root = objectify.fromstring(getXML())
导入请求
从lxml导入objectify
def getXML():
toDate=“2016-03-12”
fromDate=“2016-03-12”
dateType=“gasday”
url=”http://marketinformation.natgrid.co.uk/MIPIws-public/public/publicwebservice.asmx"
headers={'content-type':'application/soap+xml;charset=utf-8'}
body=”“”
Y
Y
%
%
%
液化天然气库存水平
液化天然气,日累计可用容量,D+1
“%”(toDate、fromDate、dateType)
response=requests.post(url,data=body,headers=headers)
返回response.content
root=objectify.fromstring(getXML())
返回的XML:

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope
    xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <GetPublicationDataWMResponse
            xmlns="http://www.NationalGrid.com/MIPI/">
            <GetPublicationDataWMResult>
                <CLSMIPIPublicationObjectBE>
                    <PublicationObjectName>LNG Stock Level</PublicationObjectName>
                    <PublicationObjectData>
                        <CLSPublicationObjectDataBE>
                            <ApplicableAt>2016-03-13T15:00:07Z</ApplicableAt>
                            <ApplicableFor>2016-03-12T00:00:00Z</ApplicableFor>
                            <Value>7050.42286</Value>
                            <GeneratedTimeStamp>2016-03-13T15:56:00Z</GeneratedTimeStamp>
                            <QualityIndicator></QualityIndicator>
                            <Substituted>N</Substituted>
                            <CreatedDate>2016-03-13T15:56:28Z</CreatedDate>
                        </CLSPublicationObjectDataBE>
                    </PublicationObjectData>
                </CLSMIPIPublicationObjectBE>
                <CLSMIPIPublicationObjectBE>
                    <PublicationObjectName>LNG Capacity</PublicationObjectName>
                    <PublicationObjectData>
                        <CLSPublicationObjectDataBE>
                            <ApplicableAt>2016-03-12T15:30:00Z</ApplicableAt>
                            <ApplicableFor>2016-03-12T00:00:00Z</ApplicableFor>
                            <Value>6515042480</Value>
                            <GeneratedTimeStamp>2016-03-12T16:00:00Z</GeneratedTimeStamp>
                            <QualityIndicator></QualityIndicator>
                            <Substituted>N</Substituted>
                            <CreatedDate>2016-03-12T16:00:20Z</CreatedDate>
                        </CLSPublicationObjectDataBE>
                    </PublicationObjectData>
                </CLSMIPIPublicationObjectBE>
            </GetPublicationDataWMResult>
        </GetPublicationDataWMResponse>
    </soap:Body>
</soap:Envelope>

液化天然气库存水平
2016-03-13T15:00:07Z
2016-03-12T00:00:00Z
7050.42286
2016-03-13T15:56:00Z
N
2016-03-13T15:56:28Z
液化天然气产能
2016-03-12T15:30:00Z
2016-03-12T00:00:00Z
6515042480
2016-03-12T16:00:00Z
N
2016-03-12T16:00:20Z

使用您现有的代码,我刚刚添加了以下内容:

res= getXML()

from bs4 import BeautifulSoup
soup = BeautifulSoup(res, 'html.parser')

searchTerms= ['PublicationObjectName','ApplicableAt','ApplicableFor','Value']
# LNG Stock Level,2016-03-13T15:00:07Z, 2016-03-12T00:00:00Z, 7050.42286, ...

for st in searchTerms:
    print st+'\t',
    print soup.find(st.lower()).contents[0]
输出:

PublicationObjectName   LNG Stock Level
ApplicableAt    2016-03-13T15:00:07Z
ApplicableFor   2016-03-12T00:00:00Z
Value   7050.42286

这是XML+XPath主题中的常见问题解答,涉及带有默认名称空间的XML

声明默认名称空间的XML元素及其不带前缀的子元素隐式继承相同的默认名称空间。在XPath表达式中,要引用命名空间中的元素,需要使用已映射到相应命名空间URI的前缀。使用
lxml
代码大致如下:

root = etree.fromstring(getXML())

# map prefix 'd' to the default namespace URI
ns = { 'd': 'http://www.NationalGrid.com/MIPI/'}

publication_objects = root.xpath('//d:CLSMIPIPublicationObjectBE', namespaces=ns)
for obj in publication_objects:
    name = obj.find('d:PublicationObjectName', ns).text

    data = obj.find('d:PublicationObjectData/d:CLSPublicationObjectDataBE', ns)
    applicable_at = data.find('d:ApplicableAt', ns).text
    applicable_for = data.find('d:ApplicableFor', ns).text
    # todo: extract other relevant data and process as needed