使用BeautifulSoup和Python 3时，如何查找网页的超链接？_Python_Python 3.x_Parsing_Hyperlink

使用BeautifulSoup和Python 3时，如何查找网页的超链接？

python python-3.x parsing hyperlink

使用BeautifulSoup和Python 3时，如何查找网页的超链接？,python,python-3.x,parsing,hyperlink,Python,Python 3.x,Parsing,Hyperlink,我正在写一个脚本，只提取网页的超链接。这就是我到目前为止所做的： import bs4 as bs import urllib.request source = urllib.request.urlopen('http://www.soc.napier.ac.uk/~40009856/CW/').read() soup = bs.BeautifulSoup(source, 'lxml') #for paragraph in soup.find_all('p'): # print(pa

我正在写一个脚本，只提取网页的超链接。这就是我到目前为止所做的：

import bs4 as bs
import urllib.request

source = urllib.request.urlopen('http://www.soc.napier.ac.uk/~40009856/CW/').read()

soup = bs.BeautifulSoup(source, 'lxml')

#for paragraph in soup.find_all('p'):
 #   print(paragraph.string)

for url in soup.find_all('a'):
    print(url.get('href'))

我只想超链接到其他网页，而不是链接到PDF和电子邮件地址以及。如输出中所示

我如何指定只返回超链接？

是什么阻碍了您分析已删除的href？如果某物以.pdf结尾，你就不想要它。如果它以file://开头，您不需要它。如果它以/or.html结尾，您可能需要它。的可能副本可能会对您有所帮助。是什么阻碍您分析刮取的href？如果某物以.pdf结尾，你就不想要它。如果它以file://开头，您不需要它。如果它以/or.html结尾，您可能需要它。的可能副本可能会对您有所帮助。