Python 用刮刀分页_Python_Request_Web Scraping_Scrapy

Python 用刮刀分页

python web-scraping scrapy

Python 用刮刀分页,python,request,web-scraping,scrapy,Python,Request,Web Scraping,Scrapy,我正在尝试爬网此网站：我可以获取此页面中的所有产品，但如何在页面底部发出“查看更多”链接的请求到目前为止，我的代码是： rules = ( Rule(SgmlLinkExtractor(restrict_xpaths='//li[@class="normalLeft"]/div/a',unique=True)), Rule(SgmlLinkExtractor(restrict_xpaths='//div[@id="topParentChilds"]/div/div[@cla

我正在尝试爬网此网站：

我可以获取此页面中的所有产品，但如何在页面底部发出“查看更多”链接的请求

到目前为止，我的代码是：

rules = (
    Rule(SgmlLinkExtractor(restrict_xpaths='//li[@class="normalLeft"]/div/a',unique=True)),
    Rule(SgmlLinkExtractor(restrict_xpaths='//div[@id="topParentChilds"]/div/div[@class="clm2"]/a',unique=True)),
    Rule(SgmlLinkExtractor(restrict_xpaths='//p[@class="proHead"]/a',unique=True)),
    Rule(SgmlLinkExtractor(allow=('http://[^/]+/[^/]+/[^/]+/[^/]+$', ), deny=('/about-us/about-us/contact-us', './music.html',  ) ,unique=True),callback='parse_item'),
)

有什么帮助吗？

首先，您应该看看这个线程，了解如何处理抓取ajax动态加载的内容：

因此，单击“查看更多”按钮将触发XHR请求：

http://www.aido.com/eshop/faces/tiles/category.jsp?q=&categoryID=189&catalogueID=2&parentCategoryID=185&viewType=grid&bnm=&atmSize=&format=&gender=&ageRange=&actor=&director=&author=&region=&compProductType=&compOperatingSystem=&compScreenSize=&compCpuSpeed=&compRam=&compGraphicProcessor=&compDedicatedGraphicMemory=&mobProductType=&mobOperatingSystem=&mobCameraMegapixels=&mobScreenSize=&mobProcessor=&mobRam=&mobInternalStorage=&elecProductType=&elecFeature=&elecPlaybackFormat=&elecOutput=&elecPlatform=&elecMegaPixels=&elecOpticalZoom=&elecCapacity=&elecDisplaySize=&narrowage=&color=&prc=&k1=&k2=&k3=&k4=&k5=&k6=&k7=&k8=&k9=&k10=&k11=&k12=&startPrize=&endPrize=&newArrival=&entityType=&entityId=&brandId=&brandCmsFlag=&boutiqueID=&nmt=&disc=&rat=&cts=empty&isBoutiqueSoldOut=undefined&sort=12&isAjax=true&hstart=24&targetDIV=searchResultDisplay

返回接下来24项的

text/html

。请注意此

hstart=24

参数：第一次单击“查看更多”时，它等于24，第二次为-48等等。这应该是您的救命稻草

现在，您应该在spider中模拟这些请求。建议的方法是实例化scrapy的对象，提供回调，从中提取数据

希望这会有所帮助。

这很有帮助，但是一个关于如何“实例化scrapy的请求对象”的例子会更有用。