Python lxml解析中的名称空间参数_Python_Lxml

Python lxml解析中的名称空间参数

python

Python lxml解析中的名称空间参数,python,lxml,Python,Lxml,我有一个试图解析的html页面。以下是我对lxml的操作： node=etree.fromstring(html) >>> node <Element {http://www.w3.org/1999/xhtml}html at 0x110676a70> >>> node.xpath('//body') [] >>> node.xpath('body') [] node=etree.fromstring（html） >>>节点 >

我有一个试图解析的html页面。以下是我对lxml的操作：

node=etree.fromstring(html)
>>> node
<Element {http://www.w3.org/1999/xhtml}html at 0x110676a70>
>>> node.xpath('//body')
[]
>>> node.xpath('body')
[]

node=etree.fromstring（html）
>>>节点
>>>xpath（'//body'）
[]
>>>xpath（'body'）
[]

不幸的是，我所有的xpath调用现在都返回一个空列表。发生这种情况的原因以及如何修复此调用？

您可以在此处添加名称空间，如下所示：

>>> node.xpath('//xmlns:tr', namespaces={'xmlns':'http://www.w3.org/1999/xhtml'})
[<Element {http://www.w3.org/1999/xhtml}tr at 0x11067b6c8>, <Element {http://www.w3.org/1999/xhtml}tr at 0x11067b710>]

查询时需要使用名称空间前缀。像

node.xpath('//html:body', namespaces={'html': 'http://...'})

或者您可以使用

.nsmap

node.xpath('//html:body', namespaces=node.nsmap)

这假设所有名称空间都是在

节点所指向的标记上定义的。对于大多数xml
文档来说，这通常是正确的。
可能是所有标记都按照您所猜测的名称空间进行了命名，使用html解析模块可能是最简单的，否则在使用名称空间时，您必须执行以下操作：node.xpath（'//html:body'，名称空间={html'：'http://www.w3.org/1999/xhtml'}）lxml.html.fromstring
node.xpath('//html:body', namespaces=node.nsmap)