Python 刮痕分页-停止卡盘
有人能帮我理解我在这段代码的分页中犯了什么错误吗 当我尝试通用xpath时:Python 刮痕分页-停止卡盘,python,web-scraping,pagination,scrapy,Python,Web Scraping,Pagination,Scrapy,有人能帮我理解我在这段代码的分页中犯了什么错误吗 当我尝试通用xpath时: if len(response.xpath("//*")) == 0: raise CloseSpider('No more products to scrape...') 我得到了所有的数据,但代码没有停止 当我尝试此xpath时: if len(response.xpath("//h3[@class='shelf-product-name ']/a/@href"))
if len(response.xpath("//*")) == 0:
raise CloseSpider('No more products to scrape...')
我得到了所有的数据,但代码没有停止
当我尝试此xpath时:
if len(response.xpath("//h3[@class='shelf-product-name ']/a/@href")) == 0:
raise CloseSpider('No more products to scrape...')
页面范围从0到50,理论上应返回2550个项目。(每页50项)
但当我使用第二个xpath时,它会在某个点停止,但我不知道为什么
import scrapy
from scrapy.exceptions import CloseSpider
class ProdutosSpider(scrapy.Spider):
name = 'produtos_aplus'
allowed_domains = ['www.allpartsnet.com.br']
start_urls = ["https://www.allpartsnet.com.br/buscapagina?fq=B%3a1228&O=OrderByNameASC&PS=50&sl=5d58b484-137e-4091-92ca-29d2e0c70f85&cc=1&sm=0&PageNumber=0"]
page = 0
def parse(self, response):
if len(response.xpath("//h3[@class='shelf-product-name ']/a/@href")) == 0:
raise CloseSpider('No more products to scrape...')
for produtos in response.xpath("//div[@class='QD prateleira row qd-xs n1colunas']/ul"):
link = produtos.xpath(".//h3[@class='shelf-product-name ']/a/@href").get()
cod_all = produtos.xpath(".//span[@class='insert-sku-name']/text()").get()
yield response.follow(url=link, callback=self.parse_produto, meta={'link': link, 'cod_all': cod_all})
self.page += 1
yield scrapy.Request(
url=f'https://www.allpartsnet.com.br/buscapagina?fq=B%3a1228&O=OrderByNameASC&PS=50&sl=5d58b484-137e-4091-92ca-29d2e0c70f85&cc=1&sm=0&PageNumber={self.page}',
callback=self.parse
)
def parse_produto(self, response):
link = response.request.meta['link']
cod_all = response.request.meta['cod_all']
for produtos in response.xpath("//div[@class='vehicle-selection']/div[@id='caracteristicas']"):
yield{
'link': link,
'cod_all': cod_all,
'fabricante': produtos.xpath(".//td[@class='value-field Fabricante']/text()").get(),
'ean': produtos.xpath(".//td[@class='value-field Codigo-EAN']/text()").get(),
'oem': produtos.xpath(".//td[@class='value-field Codigo-OEM']/text()").get()
}