Python Scrapy-使用头和请求负载模拟AJAX请求
这是我正在努力浏览的网站。打开网站时,将使用ajax请求生成列表。每当您向下滚动时,相同的请求会不断填充页面。这就是他们实现无限滚动的方式 当我向下滚动时,我发现这是发送到服务器的请求,我试图用头和请求负载模拟相同的请求。这是我的蜘蛛Python Scrapy-使用头和请求负载模拟AJAX请求,python,ajax,python-3.x,web-scraping,scrapy,Python,Ajax,Python 3.x,Web Scraping,Scrapy,这是我正在努力浏览的网站。打开网站时,将使用ajax请求生成列表。每当您向下滚动时,相同的请求会不断填充页面。这就是他们实现无限滚动的方式 当我向下滚动时,我发现这是发送到服务器的请求,我试图用头和请求负载模拟相同的请求。这是我的蜘蛛 class MySpider(scrapy.Spider): name = 'kralilanspider' allowed_domains = ['kralilan.com'] start_urls = [ 'https
class MySpider(scrapy.Spider):
name = 'kralilanspider'
allowed_domains = ['kralilan.com']
start_urls = [
'https://www.kralilan.com/liste/satilik-bina'
]
def parse(self, response):
headers = {'Referer': 'https://www.kralilan.com/liste/kiralik-bina',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
#'Content-Type': 'application/json; charset=utf-8',
#'X-Requested-With': 'XMLHttpRequest',
#'Content-Length': 246,
#'Connection': 'keep-alive',
}
yield scrapy.Request(
url='https://www.kralilan.com/services/ki_operation.asmx/getFilter',
method='POST',
headers=headers,
callback=self.parse_ajax
)
def parse_ajax(self, response):
yield {'data': response.text}
如果我取消注释已注释的标题,请求将失败,状态代码为400或500。
我尝试将请求负载作为解析方法中的主体发送。那也没用。
如果我尝试生成response.body,我会得到TypeError:bytes类型的对象不可JSON序列化。
我在这里遗漏了什么?以下实现将为您获取您想要的响应。您错过了在post请求中作为参数传递的最重要部分数据
import json
import scrapy
class MySpider(scrapy.Spider):
name = 'kralilanspider'
data = {'incomestr':'["Bina","1",-1,-1,-1,-1,-1,5]', 'intextstr':'{"isCoordinates":false,"ListDrop":[],"ListText":[{"id":"78","Min":"","Max":""},{"id":"107","Min":"","Max":""}],"FiyatData":{"Max":"","Min":""}}', 'index':0 , 'count':'10' , 'opt':'1' , 'type':'3'}
def start_requests(self):
yield scrapy.Request(
url='https://www.kralilan.com/services/ki_operation.asmx/getFilter',
method='POST',
body=json.dumps(self.data),
headers={"content-type": "application/json"}
)
def parse(self, response):
items = json.loads(response.text)['d']
yield {"data":items}
如果您想要解析来自多个页面的数据,则向下滚动时会记录新的页面索引,下面的操作将完成。分页在数据的索引键内
import json
import scrapy
class MySpider(scrapy.Spider):
name = 'kralilanspider'
data = {'incomestr':'["Bina","1",-1,-1,-1,-1,-1,5]', 'intextstr':'{"isCoordinates":false,"ListDrop":[],"ListText":[{"id":"78","Min":"","Max":""},{"id":"107","Min":"","Max":""}],"FiyatData":{"Max":"","Min":""}}', 'index':0 , 'count':'10' , 'opt':'1' , 'type':'3'}
headers = {"content-type": "application/json"}
url = 'https://www.kralilan.com/services/ki_operation.asmx/getFilter'
def start_requests(self):
yield scrapy.Request(
url=self.url,
method='POST',
body=json.dumps(self.data),
headers=self.headers,
meta={'index': 0}
)
def parse(self, response):
items = json.loads(response.text)['d']
res = scrapy.Selector(text=items)
for item in res.css(".list-r-b-div"):
title = item.css(".add-title strong::text").get()
price = item.css(".item-price::text").get()
yield {"title":title,"price":price}
page = response.meta['index'] + 1
self.data['index'] = page
yield scrapy.Request(self.url, headers=self.headers, method='POST', body=json.dumps(self.data), meta={'index': page})
为什么你忽视了帖子的主体?您也需要提交:
def parse(self, response):
headers = {'Referer': 'https://www.kralilan.com/liste/kiralik-bina',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Content-Type': 'application/json; charset=utf-8',
'X-Requested-With': 'XMLHttpRequest',
#'Content-Length': 246,
#'Connection': 'keep-alive',
}
payload = """
{ incomestr:'["Bina","2",-1,-1,-1,-1,-1,5]', intextstr:'{"isCoordinates":false,"ListDrop":[],"ListText":[{"id":"78","Min":"","Max":""},{"id":"107","Min":"","Max":""}],"FiyatData":{"Max":"","Min":""}}', index:'0' , count:'10' , opt:'1' , type:'3'}
"""
yield scrapy.Request(
url='https://www.kralilan.com/services/ki_operation.asmx/getFilter',
method='POST',
body=payload,
headers=headers,
callback=self.parse_ajax
)
感谢您的回答,它可能是正确的,但我也尝试了这个,得到了json.decoder.jsondecoderror:Expecting value:line 1 column 1 char 0 error。你知道为什么会这样吗?错误本身没有任何帮助。我试图重现您遇到的问题,但不幸的是,每次运行它时,它都会给我所需的结果。我想提出一个更新,以防您要解析多个页面中的数据。谢谢。Hi@SIM,有没有办法为下一页提供一个长查询,在下一页中只需更改一个变量?