Python 如何从lxml.html.html5paser元素标记中删除名称空间值_Python_Html_Lxml_Html5lib

Python 如何从lxml.html.html5paser元素标记中删除名称空间值

python html

Python 如何从lxml.html.html5paser元素标记中删除名称空间值,python,html,lxml,html5lib,Python,Html,Lxml,Html5lib,使用时是否可以不为标记添加名称空间来自lxml.html包的HTML5解析器例如： from lxml import html print(html.parse('http://example.com').getroot().tag) # You will get 'html' from lxml.html import html5parser print(html5parser.parse('http://example.com').getroot().tag) # You will g

使用时是否可以不为标记添加名称空间来自lxml.html包的HTML5解析器

例如：

from lxml import html
print(html.parse('http://example.com').getroot().tag)
# You will get 'html'

from lxml.html import html5parser
print(html5parser.parse('http://example.com').getroot().tag)
# You will get '{http://www.w3.org/1999/xhtml}html'

我找到的最简单的解决方案是使用regex删除它，但是

可能根本不包含该文本？

有一个特定的

名称空间HTMLElements

布尔标志控制此行为：

from lxml.html import html5parser
from html5lib import HTMLParser

root = html5parser.parse('http://example.com', 
                         parser=HTMLParser(namespaceHTMLElements=False))    
print(root.tag)  # prints "html"

原则上，这也适用于lxml的API AIUI，但请参阅。