Python 在Scrapy中向ASMX Webservice发送Ajax请求

Python 在Scrapy中向ASMX Webservice发送Ajax请求,python,scrapy,web-crawler,asmx,Python,Scrapy,Web Crawler,Asmx,我正在努力清理网站。查看源页面可以发现,每当页面加载时,表单都会从服务器接收一个VarsSessionID。单击“继续”按钮时,表单会向一个应用程序发送一个AJAX请求。Web服务返回重定向到显示搜索结果的新url 我已经实现了my scrapy spider,可以提交ajax post请求,如下所示: import scrapy from scrapy.http import * from scrapy.selector import Selector import json from scr

我正在努力清理网站。查看源页面可以发现,每当页面加载时,表单都会从服务器接收一个VarsSessionID。单击“继续”按钮时,表单会向一个应用程序发送一个AJAX请求。Web服务返回重定向到显示搜索结果的新url

我已经实现了my scrapy spider,可以提交ajax post请求,如下所示:

import scrapy
from scrapy.http import *
from scrapy.selector import Selector
import json
from scrapy.utils.response import open_in_browser



class TestSpider(scrapy.Spider):
    name = "test"
    allowed_domains = ['customer2.videcom.com']
    start_urls = ['http://customer2.videcom.com/med-
    view/VARS/Public/CustomerPanels/requirements.aspx?country=ng&lang=en']

def parse(self, response):
    form_data = {
        'VarsSessionID': '',
        '__VIEWSTATE': '/wEPDwULLTE4MTk4NDM5NjEPZBYCAgMPZBYCAgMPFgIeB1Zpc2libGVoZGSNuC4VK36MoPTmce49gcH1j2nxAPDYsLXii0G/syddwQ=='}
    yield FormRequest.from_response(response,
                                    formid='frmChangePage',
                                    formdata=form_data,
                                    method='POST',
                                    callback=self.after_parse,
                                    url='http://customer2.videcom.com/med-view/VARS/Public/CustomerPanels/requirements.aspx?country=ng&lang=en',
                                    )

def after_parse(self, response):
    print "====RESPONSE==="
    print response.headers
    print "=========="
    print response.request.headers
    print "=========="
    VarsSessionID = Selector(response=response).xpath("//*[@id='VarsSessionID']/@value").extract()[0]
    viewstate = Selector(response=response).xpath("//*[@id='__VIEWSTATE']/@value").extract()[0]
    print "VarsSessionID: " + VarsSessionID
    print "__VIEWSTATE: " + viewstate
    url = "http://customer2.videcom.com/med-view/VARS/Public/WebServices/AvailabilityWS.asmx/GetFlightAvailability?VarsSessionID="+VarsSessionID
    payload = {
        "FormData":
            {
                'Origin': ['LOS'],
                'VarsSessionID': VarsSessionID,
                'Destination': ['ABV'],
                'DepartureDate': ['05-May-2017'],
                'ReturnDate': '',
                'Adults': '1',
                'Children': '0',
                'SmallChildren': '0',
                "Seniors": '0',
                "Students": '0',
                "Infants": '0',
                "Youths": '0',
                "Teachers": '0',
                "SeatedInfants": '0',
                "EVoucher": '',
                "recaptcha": 'SHOW',
                "SearchUser": 'PUBLIC',
                "SearchSource": "requirements"
            }, "IsMMBChangeFlightMode": 'false'
    }
    headers = {
        'Accept': 'application/json, text/javascript, */*',
        'Accept-Encoding': 'gzip, deflate, br',
        'accept-language': 'en_US',
        'Connection': 'keep-alive',
        'content-type': 'application/json',
        'Cookie': {'VarsSessionID':''},
        'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
    }
    yield Request(url,
                  callback=self.after_search,
                  method='POST',
                  body=json.dumps(payload),
                  headers=headers)

def after_search(self, response):
    print "========SEARCH HEADERS========"
    print response.headers
    print response.request.headers
    open_in_browser(response)
我使用Chrome开发者工具检查了标题(请求和响应),以确保cookie和其他标题细节

运行上述代码时,我不断收到
内部服务器错误500
,如下所示:

2017-05-02 11:52:47 [scrapy.downloadermiddlewares.cookies] DEBUG: 
Sending cookies to: <POST http://customer2.videcom.com/med-view/VARS/Public/WebServices/AvailabilityWS.asmx/GetFlightAvailability?VarsSessionID=3d2048c4-2af5-4065-999f-8df6f162737b>
Cookie: ASP.NET_SessionId=v2kipt3kr2elvkat5buyajhs

2017-05-02 11:52:49 [scrapy.downloadermiddlewares.retry] DEBUG: 
Retrying <POST http://customer2.videcom.com/med-view/VARS/Public/WebServices/AvailabilityWS.asmx/GetFlightAvailability?VarsSessionID=3d2048c4-2af5-4065-999f-8df6f162737b> (failed 1 times): 500 
Internal Server Error
2017-05-02 11:52:49 [scrapy.downloadermiddlewares.cookies] DEBUG: 
Sending cookies to: <POST http://customer2.videcom.com/med-view/VARS/Public/WebServices/AvailabilityWS.asmx/GetFlightAvailability?
VarsSessionID=3d2048c4-2af5-4065-999f-8df6f162737b>
Cookie: ASP.NET_SessionId=v2kipt3kr2elvkat5buyajhs

2017-05-02 11:52:52 [scrapy.downloadermiddlewares.retry] DEBUG: 
Retrying <POST http://customer2.videcom.com/med-view/VARS/Public/WebServices/AvailabilityWS.asmx/GetFlightAvailability?
VarsSessionID=3d2048c4-2af5-4065-999f-8df6f162737b> (failed 2 times): 
500 Internal Server Error
2017-05-02 11:52:52 [scrapy.downloadermiddlewares.cookies] DEBUG: 
Sending cookies to: <POST http://customer2.videcom.com/med-view/VARS/Public/WebServices/AvailabilityWS.asmx/GetFlightAvailability?
VarsSessionID=3d2048c4-2af5-4065-999f-8df6f162737b>
Cookie: ASP.NET_SessionId=v2kipt3kr2elvkat5buyajhs

2017-05-02 11:52:54 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <POST http://customer2.videcom.com/med-view/VARS/Public/WebServices/AvailabilityWS.asmx/GetFlightAvailability?
VarsSessionID=3d2048c4-2af5-4065-999f-8df6f162737b> (failed 3 times): 
500 Internal Server Error
2017-05-02 11:52:54 [scrapy.core.engine] DEBUG: Crawled (500) <POST 
http://customer2.videcom.com/med-view/VARS/Public/WebServices/AvailabilityWS.asmx/GetFlightAvailability?
VarsSessionID=3d2048c4-2af5-4065-999f-8df6f162737b> (referer: 
http://customer2.videcom.com/med-view/VARS/Public/CustomerPanels/requirements.aspx?country=ng&lang=en)
2017-05-02 11:52:54 [scrapy.spidermiddlewares.httperror] INFO: 
Ignoring response <500 http://customer2.videcom.com/med-view/VARS/Public/WebServices/AvailabilityWS.asmx/GetFlightAvailability?
VarsSessionID=3d2048c4-2af5-4065-999f-8df6f162737b>: HTTP status code is not handled or not allowed
2017-05-02 11:52:54 [scrapy.core.engine] INFO: Closing spider 
(finished)
2017-05-02 11:52:47[scrapy.downloadermiddleware.cookies]调试:
正在将Cookie发送到:
Cookie:ASP.NET_SessionId=v2kipt3kr2elvkat5buyajhs
2017-05-02 11:52:49[scrapy.DownloaderMiddleware.retry]调试:
重试(失败1次):500
内部服务器错误
2017-05-02 11:52:49[scrapy.DownloaderMiddleware.cookies]调试:
正在将Cookie发送到:
Cookie:ASP.NET_SessionId=v2kipt3kr2elvkat5buyajhs
2017-05-02 11:52:52[scrapy.DownloaderMiddleware.重试]调试:
重试(失败2次):
500内部服务器错误
2017-05-02 11:52:52[scrapy.DownloaderMiddleware.cookies]调试:
正在将Cookie发送到:
Cookie:ASP.NET_SessionId=v2kipt3kr2elvkat5buyajhs
2017-05-02 11:52:54[scrapy.DownloaderMiddleware.retry]调试:放弃重试(失败3次):
500内部服务器错误
2017-05-02 11:52:54[刮屑核心引擎]调试:爬网(500)(参考:
http://customer2.videcom.com/med-view/VARS/Public/CustomerPanels/requirements.aspx?country=ng&lang=en)
2017-05-02 11:52:54[scrapy.spidermiddleware.httperror]信息:
忽略响应:未处理或不允许HTTP状态代码
2017-05-02 11:52:54[刮屑芯发动机]信息:关闭卡盘
(已完成)

我需要帮助了解如何发布数据和接收搜索结果,例如使用浏览器搜索时谢谢

将请求中硬编码的
\uu VIEWSTATE
参数替换为“fresh”参数

viewstate绑定到某个复杂状态,该状态在一段时间后变得无效

有时在ASP网站上
FormRequest.from\u response
无法正确捕获此参数,因此您可能必须检查response.body以查看如何提取
\u视图状态


这里给出了一个很好的例子:

感谢您的回复。
\uu VIEWSTATE
参数在页面加载时设置。我删除了硬编码的那个。主要问题是在
after_parse
方法中的第二个请求上,我向ASMX Web服务发送了一个请求。没有任何地方可以使用
\uuu VIEWSTATE
或将其发送到服务器,我不断收到
500内部服务器
错误