Python多层web抓取_Python_Web Scraping_Beautifulsoup_Scrapy_Pycharm

Python多层web抓取

python web-scraping scrapy pycharm

Python多层web抓取,python,web-scraping,beautifulsoup,scrapy,pycharm,Python,Web Scraping,Beautifulsoup,Scrapy,Pycharm,我想遍历这个列表（）上的每个URL，然后复制数据并返回到根列表，以获得下一个URL。我可以从单个页面上抓取，但不能从多个链接中抓取。您可以使用href找到所有的标记，并将它们拉到列表中。然后，只需迭代该列表。您可能需要添加一些额外的过滤器，因为您可能只需要特定的链接，但这将使您能够： import requests from bs4 import BeautifulSoup url = 'https://express-press-release.net/Industries/Automot

我想遍历这个列表（）上的每个URL，然后复制数据并返回到根列表，以获得下一个URL。

我可以从单个页面上抓取，但不能从多个链接中抓取。

您可以使用href找到所有的

标记，并将它们拉到列表中。然后，只需迭代该列表。您可能需要添加一些额外的过滤器，因为您可能只需要特定的链接，但这将使您能够：

import requests
from bs4 import BeautifulSoup

url = 'https://express-press-release.net/Industries/Automotive-press-releases.php'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

links = soup.find_all('a', href=True)

root = 'https://express-press-release.net/'

link_list = [ root + a['href'] for a in links if '..' in a['href'] ]

for link in link_list:
    do some stuff...

你好，Sohel，请提供更多的细节和清晰度，否则这个问题可能会被关闭。谢谢。你能发布一些代码吗，Sohel，来展示你到目前为止所做的尝试以及为什么没有成功？