Python 2.7 使用scrapy从无限滚动中刮取数据_Python 2.7_Scrapy

Python 2.7 使用scrapy从无限滚动中刮取数据

python-2.7 scrapy

Python 2.7 使用scrapy从无限滚动中刮取数据,python-2.7,scrapy,Python 2.7,Scrapy,我不熟悉python和scrapy 我想从网站上删除数据该网站使用AJAX进行滚动获取请求url如下所示 http://www.justdial.com/functions/ajxsearch.php?national_search=0&act=pagination&city=Mumbai&search=Chemical+Dealers&where=&catid=944&psearch=&prid=&page=2&SID

我不熟悉python和scrapy

我想从网站上删除数据

该网站使用AJAX进行滚动

获取请求url如下所示

http://www.justdial.com/functions/ajxsearch.php?national_search=0&act=pagination&city=Mumbai&search=Chemical+Dealers&where=&catid=944&psearch=&prid=&page=2&SID=&mntypgrp=0&toknbkt=&bookDate=

请帮助我如何使用scrapy或任何其他python库

谢谢。

这个AJAX请求似乎需要一个正确的

Referer

头，它只是当前页面的url。您只需在创建请求时设置标题：

def parse(self, response):
    # e.g. http://www.justdial.com/Mumbai/Dentists/ct-385543
    my_headers = {'Referer': response.url}
    yield Request("ajax_request_url",
                  headers=my_headers,
                  callback=self.parse_ajax)

def parse_ajax(self, response):
    # results should be here

谢谢你的回答。我会尽力让你知道我已经试过代码，也尝试过删除数据，但只获得前10条记录，但我想删除所有页面中的数据。@JT28你只需要增加

page

url参数。i、 e.这部分url

&page=2

用于其他页面，直到没有结果为止，这可能意味着你已经过了最后一页。所以在这种情况下，我的回复url是

http://www.justdial.com/functions/ajxsearch.php?national_search=0&act=pagination&city=Mumbai&search=Chemical+经销商&where=&catid=944&psearch=&prid=&page=2&SID=&mntypgrp=0&toknbkt=&bookDate=

right？。。！因为在

url=#例如。http://www.justdial.com/Mumbai/Dentists/ct-385543

我无法直接放置或查找

&page

option@JT28是的，你需要修改response.url而不是Referer头。您可以使用类似于

next\u url=re.sub（'page=\d+'，'page={}.format（next\u page），response.url）

where

next\u page=int（re.findall（'page=（\d+），response.url）[0]）+1

之类的东西，您可以理解为：）