Python 2.7 Scrapy post请求被重定向到错误页面

Python 2.7 Scrapy post请求被重定向到错误页面,python-2.7,redirect,post,scrapy,fiddler,Python 2.7,Redirect,Post,Scrapy,Fiddler,我正在尝试进入第页的详细信息 要从网络上到达那里,请单击1。蒂卢洛领事2号。从矿物下拉列表中选择ORO,然后选择3。点击巴士。4.然后单击列表中的第一项 开发工具和Fiddler显示,我应该使用项目id作为有效负载发出POST请求,然后将此POST请求重定向到详细信息页面 在我的情况下,我被重定向到主页。我错过了什么 这是我的痒蜘蛛 # -*- coding: utf-8 -*- import scrapy from scrapy.shell import inspect_response

我正在尝试进入第页的详细信息

要从网络上到达那里,请单击1。蒂卢洛领事2号。从矿物下拉列表中选择ORO,然后选择3。点击巴士。4.然后单击列表中的第一项

开发工具和Fiddler显示,我应该使用项目id作为有效负载发出POST请求,然后将此POST请求重定向到详细信息页面

在我的情况下,我被重定向到主页。我错过了什么

这是我的痒蜘蛛

# -*- coding: utf-8 -*-
import scrapy
from scrapy.shell import inspect_response



class CodeSpider(scrapy.Spider):
    name = "col"

    start_urls =['http://www.cmc.gov.co:8080/CmcFrontEnd/consulta/index.cmc']

    headers ={
        "Connection": "keep-alive",
        "Cache-Control": "max-age=0",
        "Origin": "http://www.cmc.gov.co:8080",
        "Upgrade-Insecure-Requests": "1",
        "DNT": "1",
        "Content-Type": "application/x-www-form-urlencoded",
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1, AppleWebKit/537.36 (KHTML, like Gecko, Chrome/68.0.3440.106 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Referer":"http://www.cmc.gov.co:8080/CmcFrontEnd/consulta/busqueda.cmc",
        "Accept-Encoding": "gzip, deflate",
        "Accept-Language": "en-US,en;q=0.9,ru;q=0.8,uk;q=0.7",
             }


    def parse(self, response):
        inspect_response(response, self)
        payload = {'expediente': '29', 'tipoSolicitud': ''}
        url = 'http://www.cmc.gov.co:8080/CmcFrontEnd/consulta/busqueda.cmc'
        yield scrapy.FormRequest(url,  formdata = payload, headers=self.headers, callback = self.parse, dont_filter=True)
这是带有重定向的日志

2018-08-23 13:58:05 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://www.cmc.gov.co:8080/CmcFrontEnd/consulta/index.cmc> from <POST http://
www.cmc.gov.co:8080/CmcFrontEnd/consulta/busqueda.cmc>
2018-08-23 13:58:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.cmc.gov.co:8080/CmcFrontEnd/consulta/index.cmc> (referer: http://www.cmc.gov.co:8080/CmcFron
tEnd/consulta/busqueda.cmc)
我错过了什么

此外,如果我将Postman代码与GET for details页面一起使用,它可以正常工作并返回页面。 在Scrapy重定向中使用相同的代码

In [1]: url = "http://www.cmc.gov.co:8080/CmcFrontEnd/consulta/detalleExpedienteTitulo.cmc"^M
   ...: ^M
   ...: headers = {^M
   ...:     'upgrade-insecure-requests': "1",^M
   ...:     'user-agent': "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36",^M
   ...:     'dnt': "1",^M
   ...:     'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",^M
   ...:     'referer': "http://www.cmc.gov.co:8080/CmcFrontEnd/consulta/busqueda.cmc",^M
   ...:     'accept-encoding': "gzip, deflate",^M
   ...:     'accept-language': "en-US,en;q=0.9,ru;q=0.8,uk;q=0.7",^M
   ...:     'cookie': "PHPSESSID=2ba8dsre6l42un95qu33k09ud6",^M
   ...:     'cache-control': "no-cache",^M
   ...:     ^M
   ...:     }^M
   ...:

In [2]: fetch(url, headers=headers)
2018-08-23 14:47:13 [scrapy.core.engine] INFO: Spider opened
2018-08-23 14:47:13 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://www.cmc.gov.co:8080/CmcFrontEnd/consulta/index.cmc> from <GET http://w
ww.cmc.gov.co:8080/CmcFrontEnd/consulta/detalleExpedienteTitulo.cmc>
2018-08-23 14:47:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.cmc.gov.co:8080/CmcFrontEnd/consulta/index.cmc> (referer: http://www.cmc.gov.co:8080/CmcFron
tEnd/consulta/busqueda.cmc)

看来我在最短的时间内错过了POST请求。此post请求生成正确的会话ID,该ID对于每一次其他搜索都是新的。

该网站还为成功的请求提供302重定向。但是,重定向应该是到。有效负载和标题似乎是正确的。您可能希望尝试将数字标题值作为数字而不是字符串发送。我猜页面不会验证您的请求,因为您的负载格式整数是字符串,或者PHP会话ID有问题。@Casper是的,这是真的。据我所知,站点从这个post REQUEUST获取会话ID,然后使用get REQUEUST重定向到详细信息页面。这个会话ID是所有详细信息页面之间唯一不同的东西。在检查response.headers和request.headers时,会话id似乎是正确的。
In [1]: url = "http://www.cmc.gov.co:8080/CmcFrontEnd/consulta/detalleExpedienteTitulo.cmc"^M
   ...: ^M
   ...: headers = {^M
   ...:     'upgrade-insecure-requests': "1",^M
   ...:     'user-agent': "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36",^M
   ...:     'dnt': "1",^M
   ...:     'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",^M
   ...:     'referer': "http://www.cmc.gov.co:8080/CmcFrontEnd/consulta/busqueda.cmc",^M
   ...:     'accept-encoding': "gzip, deflate",^M
   ...:     'accept-language': "en-US,en;q=0.9,ru;q=0.8,uk;q=0.7",^M
   ...:     'cookie': "PHPSESSID=2ba8dsre6l42un95qu33k09ud6",^M
   ...:     'cache-control': "no-cache",^M
   ...:     ^M
   ...:     }^M
   ...:

In [2]: fetch(url, headers=headers)
2018-08-23 14:47:13 [scrapy.core.engine] INFO: Spider opened
2018-08-23 14:47:13 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://www.cmc.gov.co:8080/CmcFrontEnd/consulta/index.cmc> from <GET http://w
ww.cmc.gov.co:8080/CmcFrontEnd/consulta/detalleExpedienteTitulo.cmc>
2018-08-23 14:47:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.cmc.gov.co:8080/CmcFrontEnd/consulta/index.cmc> (referer: http://www.cmc.gov.co:8080/CmcFron
tEnd/consulta/busqueda.cmc)