Python 简单脚本的Beautfil Soup错误
我正在Windows7上运行BeautifulSoup4.5和Python3.4。这是我的剧本:Python 简单脚本的Beautfil Soup错误,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我正在Windows7上运行BeautifulSoup4.5和Python3.4。这是我的剧本: from bs4 import BeautifulSoup import urllib3 http = urllib3.PoolManager() url = 'https://scholar.google.com' response = http.request('GET', url) html2 = response.read() soup = BeautifulSoup([html2])
from bs4 import BeautifulSoup
import urllib3
http = urllib3.PoolManager()
url = 'https://scholar.google.com'
response = http.request('GET', url)
html2 = response.read()
soup = BeautifulSoup([html2])
print (type(soup))
以下是我得到的错误:
TypeError:应为字符串或缓冲区
我已经研究过了,除了去一个我不想去做的老版本的靓汤,似乎没有什么补救办法。非常感谢您的帮助。不确定您为什么要将html字符串放入此处的列表中:
soup = BeautifulSoup([html2])
替换为:
soup = BeautifulSoup(html2)
或者,您也可以传递响应文件,如object,BeautifulSoup
将为您读取它:
response = http.request('GET', url)
soup = BeautifulSoup(response)
显式指定解析器也是一个好主意:
谢谢!我想我试过了,但我想没有,因为这样就消除了错误信息。
soup = BeautifulSoup(html2, "html.parser")