Python 如何刮和无限滚动页面?
我试图在next.co.uk中刮取男士外套和夹克类别,我意识到该页面有无限滚动页面Python 如何刮和无限滚动页面?,python,web-scraping,Python,Web Scraping,我试图在next.co.uk中刮取男士外套和夹克类别,我意识到该页面有无限滚动页面 # -*- coding: utf-8 -*- import scrapy from ..items import NextItem class NewoneSpider(scrapy.Spider): name = 'newOne' allowed_domains = ['www.next.co.uk'] start_urls = [ 'https://www.next
# -*- coding: utf-8 -*-
import scrapy
from ..items import NextItem
class NewoneSpider(scrapy.Spider):
name = 'newOne'
allowed_domains = ['www.next.co.uk']
start_urls = [
'https://www.next.co.uk/shop/gender-newbornboys-gender-newbornunisex-gender-olderboys-gender-youngerboys-productaffiliation-coatsandjackets-0'
]
def parse(self, response):
items = NextItem();
global productCategory
global productSubCategory
products = response.css('.Details')
currentUrl = response.request.url
for product in products:
productCategory = 'Furniture'
productSubCategory = 'living Room'
productCountry = 'uk'
productSeller = 'John Lewis'
productLink = product.css('.TitleText::attr(href)').extract_first()
productTitle = product.css('.Desc::text').extract_first()
productImage = product.css('.Image img::attr(src)').extract_first()
productSalePrice = product.css('.Price a::text').extract_first()
items['productCategory'] = productCategory
items['productSubCategory'] = productSubCategory
items['productCountry'] = productCountry
items['productSeller'] = productSeller
items['productLink'] = productLink
items['productTitle'] = productTitle
items['productImage'] = productImage
items['productSalePrice'] = productSalePrice
yield items
我能够抓取28个项目,我可以在具有无限滚动实现的网站上看到更多项目。当您向下滚动页面时,会向服务器发送XHR呼叫并请求更多数据。 例如: 每个请求几乎相同,但url中的最后一个元素增长了24:
- srt-24
- srt-48
- srt-72
import requests
URL_TEMPLATE = 'https://www.next.co.uk/shop/gender-newbornboys-gender-newbornunisex-gender-olderboys-gender-youngerboys-productaffiliation-coatsandjackets/isort-score-minprice-0-maxprice-30000-srt-{}'
for step in range(24, 240, 24):
r = requests.get(URL_TEMPLATE.format(step))
if r.status_code == 200:
# TODO We have the data - lets parse it
pass
如果我有两个链接,我想刮喜欢 和
代码是什么样子的仍然在刮取正常数量的items@fafoworatobi我不确定我是否理解你的评论。它从第二页开始,只从第二页开始items@fafoworatobi你看过我的示例代码了吗?你能分享你的更新代码吗?它正在工作我看到了你的例子现在我已经能够从该类别中刮取216个项目
import requests
URL_TEMPLATE = 'https://www.next.co.uk/shop/gender-newbornboys-gender-newbornunisex-gender-olderboys-gender-youngerboys-productaffiliation-coatsandjackets/isort-score-minprice-0-maxprice-30000-srt-{}'
for step in range(24, 240, 24):
r = requests.get(URL_TEMPLATE.format(step))
if r.status_code == 200:
# TODO We have the data - lets parse it
pass