Python 如何从新闻文章中提取h2和h3标题_Python_Beautifulsoup

Python 如何从新闻文章中提取h2和h3标题

python

Python 如何从新闻文章中提取h2和h3标题,python,beautifulsoup,Python,Beautifulsoup,我正试图创建这个网页刮板，可以提取新闻文章的主要标题 # -*- coding: utf-8 -*- import requests from bs4 import BeautifulSoup url= input('enter the url \n') r = requests.get(url) content = r.content soup = BeautifulSoup(content, "html.parser") heading = soup.find_al

我正试图创建这个网页刮板，可以提取新闻文章的主要标题

#  -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup

url= input('enter the url \n')

r = requests.get(url)
content = r.content
soup = BeautifulSoup(content, "html.parser")
heading = soup.find_all('h1')
print(heading)
print(str.strip(heading[0].text))

这仅适用于h1标记中的标题，但会引发h2或h3标记中标题的错误。如何修改此代码，使其也适用于h2和h3标记？提前谢谢

BeautifulSoup非常灵活，只需输入您想要查找的：

soup.find_all(['h1', 'h2', 'h3'])

你甚至可以：

import re

soup.find_all(re.compile(r"^h\d$"))  # would match "h" followed by a single digit

非常感谢Alex的帮助，这很有效，我能够提取h1和h2标记，但是我如何从文章中提取主标题，例如主标题在h3标记中的位置和日期在h2中的位置。@Amitz好的，您可以通过类名找到日期：soup.findclass_=date-header.get_text，文章标题也是如此：soup.findclass\uu=post-title.get\u text。