Python 仅打印带有'&书信电报；a href>；在分隔符（HTML）中，使用BeautifulSoup堆栈_Python_Beautifulsoup

Python 仅打印带有'&书信电报；a href>；在分隔符（HTML）中，使用BeautifulSoup堆栈

python

Python 仅打印带有'&书信电报；a href>；在分隔符（HTML）中，使用BeautifulSoup堆栈,python,beautifulsoup,Python,Beautifulsoup,我正在使用BeautifulSoup打开一个URL，找到标有“观众容器”的分隔符，然后只打印以“a href”开头的行。我已经完成了前两部分（我想），但不知道如何从该部分中仅提取“a href”行： import re from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("httm://www.champlain.edu/current-students") bs =

我正在使用BeautifulSoup打开一个URL，找到标有“观众容器”的分隔符，然后只打印以“a href”开头的行。我已经完成了前两部分（我想），但不知道如何从该部分中仅提取“a href”行：

import re
from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("httm://www.champlain.edu/current-students")
bs = BeautifulSoup(html.read(), "html parser")
for link in bs.find('div', {'id': 'audience-container'}):
    print(link) #this prints the full section under audience-container, but not what I want
    # print statement to pull out ONLY'a href' that I keep messing up

试试这个：

import requests
from bs4 import BeautifulSoup


main_url = "https://www.champlain.edu"
bs = BeautifulSoup(requests.get(f"{main_url}/current-students").text, "html.parser")
for link in bs.find('div', {"id": "audience-nav"}).find_all("a"):
    print(f"{main_url}/{link.get('href')}")

from bs4 import BeautifulSoup
import requests

url = "http://www.champlain.edu/current-students"
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, 'lxml')

for link in soup.find_all('a'):
    print(link.get('href'))

输出：

https://www.champlain.edu/admitted-students
https://www.champlain.edu/current-students
https://www.champlain.edu/prospective-students
https://www.champlain.edu/undergrad-applicants
https://www.champlain.edu/online
https://www.champlain.edu/alumni
https://www.champlain.edu/parents
https://www.champlain.edu/faculty-and-staff
https://www.champlain.edu/school-counselors
https://www.champlain.edu/employer-resources
https://www.champlain.edu/prospective-employees

试试这个：

import requests
from bs4 import BeautifulSoup


main_url = "https://www.champlain.edu"
bs = BeautifulSoup(requests.get(f"{main_url}/current-students").text, "html.parser")
for link in bs.find('div', {"id": "audience-nav"}).find_all("a"):
    print(f"{main_url}/{link.get('href')}")

from bs4 import BeautifulSoup
import requests

url = "http://www.champlain.edu/current-students"
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, 'lxml')

for link in soup.find_all('a'):
    print(link.get('href'))

在您的代码中有一个错误：httm代替了http。

我希望它有用

首先，你有一个打字错误。这是

http

（s），而不是

httm

。谢谢您和其他人。反馈非常有用。