Python 使用“查找”时，BeautifulSoup挂起_Python_Html_Parsing_Beautifulsoup_Html Parsing

Python 使用“查找”时，BeautifulSoup挂起

python html parsing

Python 使用“查找”时，BeautifulSoup挂起,python,html,parsing,beautifulsoup,html-parsing,Python,Html,Parsing,Beautifulsoup,Html Parsing,我对bs4的包有问题我有一个html文档，如下所示： data = """<html><head></head><body> <p> this is tab </p> <img src="image.jpg"> </body></html> """ 当我运行它时，bs4仍然处于循环中，并且没有返回任何内容，可能是因为在某些HTML数据中，a标记不存在非常感谢。 1.是的，上述示例工作

我对bs4的

包有问题
我有一个html
文档，如下所示：
data = """<html><head></head><body>
<p> this is tab </p>
<img src="image.jpg">
</body></html>
"""

当我运行它时，bs4
仍然处于循环中，并且没有返回任何内容，可能是因为在某些HTML
数据中，a
标记不存在
非常感谢。

1.是的，上述示例工作正常。

2.但是，就我而言。数据是具有多行html字符串的变量
from bs4 import BeautifulSoup
data = open("file.htm").read()
soup = BeautifulSoup(data, 'html5lib')
soup.find_all("a")

三,。请使用我的文件进行测试：

4.我使用的是beautifulsoup4==4.4.1。Python 3.5.1

5.再次感谢。
我不明白为什么您的程序在使用find\u all
时会挂起，如果html页面很大但不应该挂起，可能需要一段时间
以下是一些您可以尝试的东西：

如果在解析网页之前下载网页，则可能会导致挂起。用于检测程序的确切挂起位置，将此行添加到代码的开头import pdb；pdb.set_trace（）
并从那里跟踪它
确保通过运行pip freeze | grep Html5Lib
安装了Html5Lib
，如果它不存在，请使用pip install Html5Lib

在一个类似的SO中，有人提到他们通过升级BeautifulSoup
修复了它，请尝试：pip安装--升级beautifulsoup4


在文档中，他们建议对某些Python版本使用特定的解析器：
如果可以，我建议您安装并使用lxml以提高速度。

如果您使用的是早于2.7.3的Python2版本，或者早于3.2.2的Python3版本，那么安装lxml
或html5lib
——Python的内置HTML解析器在旧版本中不是很好
尝试使用内置的html.parser
，它甚至可以处理无效的html
from bs4 import BeautifulSoup

data = """<html><head></head><body>
<p> this is tab </p>
<img src="image.jpg">
</body></html>
"""

soup = BeautifulSoup(data, 'html.parser')
soup.find_all("a")

从bs4导入美化组
data=”“”
这是tab
"""
soup=BeautifulSoup（数据'html.parser'）
汤。全部找到（“a”）
它通过返回空列表对我有效。您使用的是什么版本的BeautifulSoup？您可以使用import bs4；打印bs4.\uuuu版本\uuuu是！，我已经尝试了所有语法分析器，只有html5lib工作正常。其他解析器有时会生成错误的html输出。
from bs4 import BeautifulSoup

data = """<html><head></head><body>
<p> this is tab </p>
<img src="image.jpg">
</body></html>
"""

soup = BeautifulSoup(data, 'html.parser')
soup.find_all("a")