Python Scrapy-请求有效负载格式和类型_Python_Ajax_Web Scraping_Request_Scrapy

Python Scrapy-请求有效负载格式和类型

python ajax web-scraping scrapy

Python Scrapy-请求有效负载格式和类型,python,ajax,web-scraping,request,scrapy,Python,Ajax,Web Scraping,Request,Scrapy,这是我刮削过程的起点这是一个AJAX调用，它以JSON格式为每个页面返回数据我的POST请求失败，错误为404。那些要求有效载荷的请求在过去给我带来了麻烦。我总是以某种方式解决这个问题，但现在我试着去理解我对他们做错了什么我的问题是, 与scrapy请求一起发送的请求有效负载是否需要特定类型或格式我需要在发送之前调用json.dumps（有效负载），还是将它们作为字典发送在发送有效负载之前，是否需要将每个key:value对转换为字符串可能是我的请求失败的其他原因吗这是我的

这是我刮削过程的起点

这是一个AJAX调用，它以JSON格式为每个页面返回数据

我的POST请求失败，错误为404。那些要求有效载荷的请求在过去给我带来了麻烦。我总是以某种方式解决这个问题，但现在我试着去理解我对他们做错了什么

我的问题是,

与scrapy请求一起发送的请求有效负载是否需要特定类型或格式
我需要在发送之前调用
```
json.dumps（有效负载）
```
，还是将它们作为字典发送
在发送有效负载之前，是否需要将每个key:value对转换为字符串
可能是我的请求失败的其他原因吗

这是我的代码的相关部分

class MySpider(CrawlSpider):

    name = 'myspider'

    start_urls = [
        'https://www.storiaimoveis.com.br/api/search?fields=%24%24meta.geo.postalCodeAddress.city%2C%24%24meta.geo.postalCodeAddress.neighborhood%2C%24%24meta.geo.postalCodeAddress.street%2C%24%24meta.location%2C%24%24meta.created%2Caddress.number%2Caddress.postalCode%2Caddress.neighborhood%2Caddress.state%2Cmedia%2ClivingArea%2CtotalArea%2Ctypes%2Coperation%2CsalePrice%2CrentPrice%2CnewDevelopment%2CadministrationFee%2CyearlyTax%2Caccount.logoUrl%2Caccount.name%2Caccount.id%2Caccount.creci%2Cgarage%2Cbedrooms%2Csuites%2Cbathrooms%2Cref&optimizeMedia=true&size=20&from=0&sessionId=5ff29d7e-88d0-54d5-2641-e203cafd6f4e'
    ]

    page = 1
    payload = {"locations":[{"geo":{"top_left":{"lat":5.2717863,
                                                "lon":-73.982817},
                                    "bottom_right":{"lat":-34.0891,
                                                    "lon":-28.650543}},
                             "placeId":"ChIJzyjM68dZnAARYz4p8gYVWik",
                             "keywords":"Brasil",
                             "address":{"label":"Brasil","country":"BR"}}],
               "operation":["RENT"],
               "bathrooms":[],
               "bedrooms":[],
               "garage":[],
               "features":[]}
    headers = {
        'Accept': 'application/json',
        'Content-Type': 'application/json',
        'Referer': 'https://www.storiaimoveis.com.br/alugar/brasil',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'
    }


    def parse(self, response):
        for url in self.start_urls:
            yield scrapy.Request(url=url,
                                 method='POST',
                                 headers=self.headers,
                                 body=json.dumps(self.payload),
                                 callback=self.parse_items)

    def parse_items(self, response):
        from scrapy.shell import inspect_response
        inspect_response(response, self)
        print response.text

是的，您需要调用

json.dumps（payload）

，因为请求主体需要是

str或unicode，如文档中所述：
但是，在您的情况下，请求失败是因为缺少以下两个标题：Content-Type
和Referer

为了获得正确的请求头，我通常会做以下工作：
检查Chrome开发工具中的标题：

使用curl
或Postman
发出请求，直到我得到正确的标题。在这种情况下，Content-Type
和Referer
似乎足以满足HTTP 200响应状态：

尝试解释手动创建搜索的初始URL的步骤，以及如何构建供脚本使用的URL。