使用Python在web爬虫程序包中传递动态URL_Python_Selenium_Beautifulsoup_Scrapy_Web Crawler

使用Python在web爬虫程序包中传递动态URL

python selenium scrapy web-crawler

使用Python在web爬虫程序包中传递动态URL,python,selenium,beautifulsoup,scrapy,web-crawler,Python,Selenium,Beautifulsoup,Scrapy,Web Crawler,我正在使用Python构建一个web scraper，使我的工作变得简单。我做了一些搜索，我发现硒，美丽的肥皂和废料包有助于实现这一点。我的需要有点不同。让我举例说明 class BlogSpider(scrapy.Spider): name = 'blogspider' start_urls = ['https://blog.scrapinghub.com'] def parse(self, response): for title in respon

我正在使用Python构建一个web scraper，使我的工作变得简单。我做了一些搜索，我发现硒，美丽的肥皂和废料包有助于实现这一点。我的需要有点不同。让我举例说明

class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://blog.scrapinghub.com']

    def parse(self, response):
        for title in response.css('h2.entry-title'):
            yield {'title': title.css('a ::text').extract_first()}

        next_page = response.css('div.prev-post > a ::attr(href)').extract_first()
        if next_page:
            yield scrapy.Request(response.urljoin(next_page), callback=self.parse)

这正是我想要实现的目标

我不想输入URL作为参数。我希望能够从excel工作表中的单元格值中提取它。

中描述的产品可能正是您所需要的。谢谢。我会通读的。上描述的产品可能正是您所需要的。谢谢。我会读出来的