Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/296.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 仅打印带有'&书信电报;a href>;在分隔符(HTML)中,使用BeautifulSoup堆栈_Python_Beautifulsoup - Fatal编程技术网

Python 仅打印带有'&书信电报;a href>;在分隔符(HTML)中,使用BeautifulSoup堆栈

Python 仅打印带有'&书信电报;a href>;在分隔符(HTML)中,使用BeautifulSoup堆栈,python,beautifulsoup,Python,Beautifulsoup,我正在使用BeautifulSoup打开一个URL,找到标有“观众容器”的分隔符,然后只打印以“a href”开头的行。我已经完成了前两部分(我想),但不知道如何从该部分中仅提取“a href”行: import re from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("httm://www.champlain.edu/current-students") bs =

我正在使用BeautifulSoup打开一个URL,找到标有“观众容器”的分隔符,然后只打印以“a href”开头的行。我已经完成了前两部分(我想),但不知道如何从该部分中仅提取“a href”行:

import re
from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("httm://www.champlain.edu/current-students")
bs = BeautifulSoup(html.read(), "html parser")
for link in bs.find('div', {'id': 'audience-container'}):
    print(link) #this prints the full section under audience-container, but not what I want
    # print statement to pull out ONLY'a href' that I keep messing up
试试这个:

import requests
from bs4 import BeautifulSoup


main_url = "https://www.champlain.edu"
bs = BeautifulSoup(requests.get(f"{main_url}/current-students").text, "html.parser")
for link in bs.find('div', {"id": "audience-nav"}).find_all("a"):
    print(f"{main_url}/{link.get('href')}")
from bs4 import BeautifulSoup
import requests

url = "http://www.champlain.edu/current-students"
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, 'lxml')

for link in soup.find_all('a'):
    print(link.get('href'))
输出:

https://www.champlain.edu/admitted-students
https://www.champlain.edu/current-students
https://www.champlain.edu/prospective-students
https://www.champlain.edu/undergrad-applicants
https://www.champlain.edu/online
https://www.champlain.edu/alumni
https://www.champlain.edu/parents
https://www.champlain.edu/faculty-and-staff
https://www.champlain.edu/school-counselors
https://www.champlain.edu/employer-resources
https://www.champlain.edu/prospective-employees
试试这个:

import requests
from bs4 import BeautifulSoup


main_url = "https://www.champlain.edu"
bs = BeautifulSoup(requests.get(f"{main_url}/current-students").text, "html.parser")
for link in bs.find('div', {"id": "audience-nav"}).find_all("a"):
    print(f"{main_url}/{link.get('href')}")
from bs4 import BeautifulSoup
import requests

url = "http://www.champlain.edu/current-students"
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, 'lxml')

for link in soup.find_all('a'):
    print(link.get('href'))
在您的代码中有一个错误:httm代替了http。
我希望它有用

首先,你有一个打字错误。这是
http
(s),而不是
httm
。谢谢您和其他人。反馈非常有用。