Python 使用BeautifulSoup从网页检索链接_Python

Python 使用BeautifulSoup从网页检索链接

python

Python 使用BeautifulSoup从网页检索链接,python,Python,我正在尝试从某个位置的网页中提取链接，然后打开该链接，然后在提供的次数内重复该过程。问题是我一直得到相同的URL返回，所以我的代码似乎只是拉标签，打印标签，而不是打开它，然后在关闭之前进行X次处理这段代码我已经写了好几次了，但就我个人而言，我就是搞不懂。请告诉我我做错了什么尝试使用list放入锚定标记，然后在列表中请求的位置打开url，然后在重新开始循环之前清除列表 import urllib.request, urllib.parse, urllib.error from bs4 impo

我正在尝试从某个位置的网页中提取链接，然后打开该链接，然后在提供的次数内重复该过程。问题是我一直得到相同的URL返回，所以我的代码似乎只是拉标签，打印标签，而不是打开它，然后在关闭之前进行X次处理

这段代码我已经写了好几次了，但就我个人而言，我就是搞不懂。请告诉我我做错了什么

尝试使用list放入锚定标记，然后在列表中请求的位置打开url，然后在重新开始循环之前清除列表

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

#url = input('Enter - ')
url = "http://py4e-data.dr-chuck.net/known_by_Fikret.html"
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

count = 0 
url_loop = int(input("Enter how many times to loop through: ")) 
url_pos= int(input("Enter position of URL: "))
url_pos = url_pos - 1

print(url_pos)



# Retrieve all of the anchor tags
tags = soup('a')
while True:
    if url_loop == count:
        break
    html = urllib.request.urlopen(url, context=ctx).read()
    soup = BeautifulSoup(html, 'html.parser')
    url = tags[url_pos].get('href', None)

    print("Acquiring URL: ", url)

    count = count + 1  

print("final URL:", url)

对于初始文档，可能只提取一次标记：

# Retrieve all of the anchor tags
tags = soup('a')

如果要在获取每个文档后重新提取标记，它们将反映最后一个文档