Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/281.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用getElementsByTagName进行健壮的DOM解析_Python_Dom - Fatal编程技术网

Python 使用getElementsByTagName进行健壮的DOM解析

Python 使用getElementsByTagName进行健壮的DOM解析,python,dom,Python,Dom,以下内容(摘自“深入Python”) 失败于 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/path/to/htmlToNumEmbedded.py", line 2, in <module> xmldoc = minidom.parse('/path/to/index.html') File "/usr/lib/python2.7

以下内容(摘自“深入Python”)

失败于

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/htmlToNumEmbedded.py", line 2, in <module>
    xmldoc = minidom.parse('/path/to/index.html')
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 1918, in parse
    return expatbuilder.parse(file)
  File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 924, in parse
    result = builder.parseFile(fp)
  File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 207, in parseFile
    parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: mismatched tag: line 12, column 4
但似乎有些笨拙:有没有我忽略的内置函数


或者使用getElementsByTagName进行健壮DOM解析的另一种更优雅的方法?

您可以使用BeautifulSoup进行以下操作:

from bs4 import BeautifulSoup

with open('/path/to/index.html') as f:
    soup = BeautifulSoup(f)
soup.find_all("img")

如果需要元素列表,请参见,而不是迭代
元素的返回值。iter
,请在其上调用
list

from lxml import html
reflist = list(html.parse('/path/to/index.html.html').iter('img'))
from bs4 import BeautifulSoup

with open('/path/to/index.html') as f:
    soup = BeautifulSoup(f)
soup.find_all("img")
from lxml import html
reflist = list(html.parse('/path/to/index.html.html').iter('img'))