Python BeautifulSoup不打印（）_Python_Printing_Error Handling_Beautifulsoup

Python BeautifulSoup不打印（）

python printing error-handling

Python BeautifulSoup不打印（）,python,printing,error-handling,beautifulsoup,Python,Printing,Error Handling,Beautifulsoup,下面的beautifulsoup脚本未显示任何输出。我错过什么了吗？它的目的是击中一些指纹 from urllib.request import urlopen from urllib.error import HTTPError from bs4 import BeautifulSoup import sys url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M" def getTitle(url): try: h

下面的beautifulsoup脚本未显示任何输出。我错过什么了吗？它的目的是击中一些指纹

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import sys

url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M"

def getTitle(url):
    try:
        html = urlopen(url)
    except HTTPError as e:
        print(e)
        return None
    try:
        bsObj = BeautifulSoup(html.read())
    except AttributeError as e:
        return None
    return bsObj

    title = getTitle(url1)

    if title == None:
        print("None at URL: " + url1)
    else:
        print(title)

对于BeautifulSoup4，我建议使用请求模块（通过pip获得）获取网站数据

要获取所需站点的html，请使用

content = requests.get(url).content

这将把整个html文档保存到变量“content”中

由此，您可以使用以下脚本打印出所需的任何数据

注意：lxml（适用于bs4的html解析器）在Python3中安装时存在问题，因此2.7是最好的版本

import requests
from bs4 import BeautifulSoup as bs

def getTitle(url):
    content = requests.get(url).content
    page = bs(content, "lxml")
    title = page.title.string
    return title

url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M"
t = getTitle(url1)

if t == None:
    print "None at url " + url1
else:
    print t

我在本地机器（Win10、Python2.7.12、requests、beautifulsoup4和通过pip安装的lxml）上测试了这个功能，它工作得非常好

如果您想了解有关请求的更多信息，可以查看，也可以找到BeautifulSoup的更多信息

希望这对您有所帮助。

编辑：你的问题终于。。。识别

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import sys

url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M"

def getTitle(url):
    try:
        html = urlopen(url)
    except HTTPError as e:
        print(e)
        return None
    try:
        bsObj = BeautifulSoup(html.read())
    except AttributeError as e:
        return None
    return bsObj

title = getTitle(url1)

if title == None:
    print("None at URL: " + url1)
else:
    print(title)

旧答案您的问题是

return bsObj

阻止执行

print

s的函数。函数只能打印

HTTPError

或

ArgumentError

如果要返回

bsObj

，需要在函数末尾返回，因为

return

退出函数

哦，你无条件地递归函数，所以无论如何它都会溢出

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import sys

url1 = "https://www.youtube.com/watch?v=APmUWC8S1_M"

def getTitle(url):
    try:
        html = urlopen(url)
    except HTTPError as e:
        print(e)
        return None
    try:
        bsObj = BeautifulSoup(html.read())
    except AttributeError as e:
        return None

    title = getTitle(url1) # Infinite recursion

    if title == None:
        print("None at URL: " + url1)
    else:
        print(title)
    return bsObj # Moved to the end

这对我很有用：

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import sys


def getContent(url):
    try:
        html = urlopen(url)
    except HTTPError as e:
        print(e)
        return None
    try:
        bsObj = BeautifulSoup(html.read())
    except AttributeError as e:
        return None
    return bsObj

url1 = "https://www.youtube.com/watch?v=v5NeyI4-fdI"
content = getContent(url1)
if content == None:
    print("Conent could not be found at URL: " + url1)
else:
    print(content)

请检查缩进-返回bsObj后的

行是否应处于相同的缩进级别？因为现在，您总是在打印HTTPError
之外的任何内容之前返回。太棒了！非常感谢。人们都很了不起。这就是我喜欢堆栈溢出的原因。@Jacs没问题，很乐意提供帮助。让我考虑一下这两个答案，我会给你们一个更深入的反馈。@Jacs我在Python 3.5上找到了一种方法，只需少量下载和命令行内容，但如果你们感兴趣，这并不难。@RedXTech感谢你们的帮助！现在很好用。这是一个缩进问题。