Python I'；我试图做一个网页刮板，我不断得到这个错误_Python_Beautifulsoup

Python I'；我试图做一个网页刮板，我不断得到这个错误

python

Python I'；我试图做一个网页刮板，我不断得到这个错误,python,beautifulsoup,Python,Beautifulsoup,我正在尝试用漂亮的汤制作一个网页刮板，它将在reddit上打印出最受欢迎的帖子，但我不断地遇到一个错误。请尽可能用简单的语言解释。代码如下： import requests from bs4 import BeautifulSoup url = 'https://www.reddit.com/' response = requests.get(url) soup = BeautifulSoup(response.content, "html.parser") article

我正在尝试用漂亮的汤制作一个网页刮板，它将在reddit上打印出最受欢迎的帖子，但我不断地遇到一个错误。请尽可能用简单的语言解释。代码如下：

import requests
from bs4 import BeautifulSoup
url = 'https://www.reddit.com/'
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
article = soup.find('div', attrs={"class": "y8HYJ-y_lTUHkQIc1mdCq _2INHSNB8V5eaWp4P0rY_mE"})
headline = article.a.h3.text
print(headline)

错误：

AttributeError: 'NoneType' object has no attribute 'a'

添加一个用户代理可能会有所帮助。大概是这样的：

headers={'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6'}

response = requests.get(url, headers)

您可以在此处找到用户代理：

请尽可能用简单的语言解释

headline = article.a.h3.text
                  ^^

“有一个错误与一个错误有关。”

“这是因为您的程序中有一些内容，”

“您试图用它做

.a

，但这是不可能的。”

这是您试图从某物中获取

.a

，这意味着

文章

是

无

article = soup.find('div', attrs={"class": "y8HYJ-y_lTUHkQIc1mdCq _2INHSNB8V5eaWp4P0rY_mE"})

这就是

article

获取其值的方式，这意味着

soup.find

返回了

None

article = soup.find('div', attrs={"class": "y8HYJ-y_lTUHkQIc1mdCq _2INHSNB8V5eaWp4P0rY_mE"})

然后查看文档，了解这意味着BeautifulSoup在HTML中找不到具有此类

类属性值的
标记。因此，您当然找不到嵌套的
标记，因为没有可嵌套的内容
很可能是服务器随机生成类名；因此，您需要查看HTML中的其他内容，以确定实际需要的类名，而不能仅仅依赖于您查看页面源代码时的类名。
您可以使用“旧”版本的reddit来获取信息（新版本使用javascript，因此BeautifulSoup不会解析您看到的某些元素）：
印刷品：
Megathread: President Donald Trump announces he has tested positive for Coronavirus


或者：在URL之后使用.json
：
import json
import requests


url = 'https://reddit.com/.json'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
data = requests.get(url, headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

print(data['data']['children'][0]['data']['title'])


注意：Reddit也有，因此您不必使用beautifulsoup
soup.find（…）
返回None（未找到任何内容）。这都是因为你的错误。我对这个完全陌生。有人能帮我吗：（你的汤里没有这样的课程。你们都很棒。你们都帮了我很多。我希望你们都有一个美好的一天：）
import requests
from bs4 import BeautifulSoup


url = 'https://old.reddit.com/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

print(soup.select_one('.entry a.title').text)

Megathread: President Donald Trump announces he has tested positive for Coronavirus

import json
import requests


url = 'https://reddit.com/.json'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
data = requests.get(url, headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

print(data['data']['children'][0]['data']['title'])