Python BeautifulSoup不打印()
下面的beautifulsoup脚本未显示任何输出。我错过什么了吗? 它的目的是击中一些指纹Python BeautifulSoup不打印(),python,printing,error-handling,beautifulsoup,Python,Printing,Error Handling,Beautifulsoup,下面的beautifulsoup脚本未显示任何输出。我错过什么了吗? 它的目的是击中一些指纹 from urllib.request import urlopen from urllib.error import HTTPError from bs4 import BeautifulSoup import sys url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M" def getTitle(url): try: h
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import sys
url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M"
def getTitle(url):
try:
html = urlopen(url)
except HTTPError as e:
print(e)
return None
try:
bsObj = BeautifulSoup(html.read())
except AttributeError as e:
return None
return bsObj
title = getTitle(url1)
if title == None:
print("None at URL: " + url1)
else:
print(title)
对于BeautifulSoup4,我建议使用请求模块(通过pip获得)获取网站数据 要获取所需站点的html,请使用
content = requests.get(url).content
这将把整个html文档保存到变量“content”中
由此,您可以使用以下脚本打印出所需的任何数据
注意:lxml(适用于bs4的html解析器)在Python3中安装时存在问题,因此2.7是最好的版本
import requests
from bs4 import BeautifulSoup as bs
def getTitle(url):
content = requests.get(url).content
page = bs(content, "lxml")
title = page.title.string
return title
url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M"
t = getTitle(url1)
if t == None:
print "None at url " + url1
else:
print t
我在本地机器(Win10、Python2.7.12、requests、beautifulsoup4和通过pip安装的lxml)上测试了这个功能,它工作得非常好
如果您想了解有关请求的更多信息,可以查看,也可以找到BeautifulSoup的更多信息
希望这对您有所帮助。编辑:
你的问题终于。。。识别
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import sys
url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M"
def getTitle(url):
try:
html = urlopen(url)
except HTTPError as e:
print(e)
return None
try:
bsObj = BeautifulSoup(html.read())
except AttributeError as e:
return None
return bsObj
title = getTitle(url1)
if title == None:
print("None at URL: " + url1)
else:
print(title)
旧答案
您的问题是return bsObj
阻止执行print
s的函数。函数只能打印HTTPError
或ArgumentError
如果要返回bsObj
,需要在函数末尾返回,因为return
退出函数
哦,你无条件地递归函数,所以无论如何它都会溢出
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import sys
url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M"
def getTitle(url):
try:
html = urlopen(url)
except HTTPError as e:
print(e)
return None
try:
bsObj = BeautifulSoup(html.read())
except AttributeError as e:
return None
title = getTitle(url1) # Infinite recursion
if title == None:
print("None at URL: " + url1)
else:
print(title)
return bsObj # Moved to the end
这对我很有用:
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import sys
def getContent(url):
try:
html = urlopen(url)
except HTTPError as e:
print(e)
return None
try:
bsObj = BeautifulSoup(html.read())
except AttributeError as e:
return None
return bsObj
url1 = "https://www.youtube.com/watch?v=v5NeyI4-fdI"
content = getContent(url1)
if content == None:
print("Conent could not be found at URL: " + url1)
else:
print(content)
请检查缩进-返回bsObj后的
行是否应处于相同的缩进级别?因为现在,您总是在打印HTTPError
之外的任何内容之前返回。太棒了!非常感谢。人们都很了不起。这就是我喜欢堆栈溢出的原因。@Jacs没问题,很乐意提供帮助。让我考虑一下这两个答案,我会给你们一个更深入的反馈。@Jacs我在Python 3.5上找到了一种方法,只需少量下载和命令行内容,但如果你们感兴趣,这并不难。@RedXTech感谢你们的帮助!现在很好用。这是一个缩进问题。