Python 刮痕分页-停止卡盘_Python_Web Scraping_Pagination_Scrapy

Python 刮痕分页-停止卡盘

python web-scraping pagination scrapy

Python 刮痕分页-停止卡盘,python,web-scraping,pagination,scrapy,Python,Web Scraping,Pagination,Scrapy,有人能帮我理解我在这段代码的分页中犯了什么错误吗当我尝试通用xpath时： if len(response.xpath("//*")) == 0: raise CloseSpider('No more products to scrape...') 我得到了所有的数据，但代码没有停止当我尝试此xpath时： if len(response.xpath("//h3[@class='shelf-product-name ']/a/@href"))

有人能帮我理解我在这段代码的分页中犯了什么错误吗

当我尝试通用xpath时：

if len(response.xpath("//*")) == 0:
    raise CloseSpider('No more products to scrape...')

我得到了所有的数据，但代码没有停止

当我尝试此xpath时：

if len(response.xpath("//h3[@class='shelf-product-name ']/a/@href")) == 0:
    raise CloseSpider('No more products to scrape...')

页面范围从0到50，理论上应返回2550个项目。（每页50项）
但当我使用第二个xpath时，它会在某个点停止，但我不知道为什么

import scrapy from scrapy.exceptions import CloseSpider class ProdutosSpider(scrapy.Spider): name = 'produtos_aplus' allowed_domains = ['www.allpartsnet.com.br'] start_urls = ["https://www.allpartsnet.com.br/buscapagina?fq=B%3a1228&O=OrderByNameASC&PS=50&sl=5d58b484-137e-4091-92ca-29d2e0c70f85&cc=1&sm=0&PageNumber=0"] page = 0 def parse(self, response): if len(response.xpath("//h3[@class='shelf-product-name ']/a/@href")) == 0: raise CloseSpider('No more products to scrape...') for produtos in response.xpath("//div[@class='QD prateleira row qd-xs n1colunas']/ul"): link = produtos.xpath(".//h3[@class='shelf-product-name ']/a/@href").get() cod_all = produtos.xpath(".//span[@class='insert-sku-name']/text()").get() yield response.follow(url=link, callback=self.parse_produto, meta={'link': link, 'cod_all': cod_all}) self.page += 1 yield scrapy.Request( url=f'https://www.allpartsnet.com.br/buscapagina?fq=B%3a1228&O=OrderByNameASC&PS=50&sl=5d58b484-137e-4091-92ca-29d2e0c70f85&cc=1&sm=0&PageNumber={self.page}', callback=self.parse ) def parse_produto(self, response): link = response.request.meta['link'] cod_all = response.request.meta['cod_all'] for produtos in response.xpath("//div[@class='vehicle-selection']/div[@id='caracteristicas']"): yield{ 'link': link, 'cod_all': cod_all, 'fabricante': produtos.xpath(".//td[@class='value-field Fabricante']/text()").get(), 'ean': produtos.xpath(".//td[@class='value-field Codigo-EAN']/text()").get(), 'oem': produtos.xpath(".//td[@class='value-field Codigo-OEM']/text()").get() }