Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用BeautifulSoup查找所有下一个链接_Python_Python 3.x_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 如何使用BeautifulSoup查找所有下一个链接

Python 如何使用BeautifulSoup查找所有下一个链接,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我目前正在通过预设一个名为number_of_pages的变量来删除特定网站的所有页面。在添加我不知道的新页面之前,预设此变量一直有效。例如,下面的代码是3页,但网站现在有4页 base_url = 'https://securityadvisories.paloaltonetworks.com/Home/Index/?page=' number_of_pages = 3 for i in range(1, number_of_pages, 1): url_to_scrape = (bas

我目前正在通过预设一个名为number_of_pages的变量来删除特定网站的所有页面。在添加我不知道的新页面之前,预设此变量一直有效。例如,下面的代码是3页,但网站现在有4页

base_url = 'https://securityadvisories.paloaltonetworks.com/Home/Index/?page='
number_of_pages = 3
for i in range(1, number_of_pages, 1):
   url_to_scrape = (base_url + str(i))
我想使用BeautifulSoup查找网站上的所有下一个链接。下面的代码查找第二个URL,但不查找第三个或第四个URL。如何在删除所有页面之前建立一个列表

base_url = 'https://securityadvisories.paloaltonetworks.com/Home/Index/?page='
CrawlRequest = requests.get(base_url)
raw_html = CrawlRequest.text
linkSoupParser = BeautifulSoup(raw_html, 'html.parser')
page = linkSoupParser.find('div', {'class': 'pagination'})
for list_of_links in page.find('a', href=True, text='next'):
  nextURL = 'https://securityadvisories.paloaltonetworks.com' + list_of_links.parent['href']
print (nextURL)

有几种不同的方法来实现分页。这是其中之一

其思想是初始化一个无止境循环,并在没有“下一个”链接时将其中断:

如果执行,您将看到打印的以下消息:

Processing page: #1; url: https://securityadvisories.paloaltonetworks.com/Home/Index/?page=
Processing page: #2; url: https://securityadvisories.paloaltonetworks.com/Home/Index/?page=2
Processing page: #3; url: https://securityadvisories.paloaltonetworks.com/Home/Index/?page=3
Processing page: #4; url: https://securityadvisories.paloaltonetworks.com/Home/Index/?page=4
Done.
请注意,为了提高性能并在请求之间持久保存cookie,我们正在使用维护一个web抓取会话

Processing page: #1; url: https://securityadvisories.paloaltonetworks.com/Home/Index/?page=
Processing page: #2; url: https://securityadvisories.paloaltonetworks.com/Home/Index/?page=2
Processing page: #3; url: https://securityadvisories.paloaltonetworks.com/Home/Index/?page=3
Processing page: #4; url: https://securityadvisories.paloaltonetworks.com/Home/Index/?page=4
Done.