Python 维扎尔刮削

Python 维扎尔刮削,python,web-scraping,python-requests,Python,Web Scraping,Python Requests,我正试着刮WizzAir以备个人使用。无法理解我的代码有什么问题。可能是不正确的有效负载对象或cookie吗 import requests headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36", "Accept": "applicati

我正试着刮WizzAir以备个人使用。无法理解我的代码有什么问题。可能是不正确的有效负载对象或cookie吗

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36",
    "Accept": "application/json, text/plain, */*",
    "Accept-Encoding": "gzip, deflate, sdch, br",
    "Accept-Language": "en-US,en;q=0.8,lt;q=0.6,ru;q=0.4",
    "Origin": "https://wizzair.com",
    "Referer": "https://wizzair.com/"

}

search_url = "https://wizzair.com/lt-LT/FlightSearch"
session = requests.Session()
r = session.get("https://be.wizzair.com/3.8.2/Api/asset/yellowRibbon", headers=headers, allow_redirects=False)
session_id = r.cookies["ASP.NET_SessionId"]

cookies = {
    "ASP.NET_SessionId": session_id,
    "HomePageSelector": "FlightSearch",
}

# wizz_url = "https://be.wizzair.com/3.8.2/Api/search/search"
wizz_url = "https://be.wizzair.com/3.8.2/Api/asset/farechart"
payload = {"flightList":[{"departureStation":"VNO","arrivalStation":"FCO","departureDate":"2017-02-20"}],"adultCount":1,"childCount":0,"infantCount":0,"wdc":True, "dayInterval":3}
r = session.post(url=wizz_url,data=payload,headers=headers, cookies=cookies)
print r.content


>>> {"validationCodes":["FlightCount_MustBe_OneOrTwo"]}

我运行这个-即使没有会话和cookie-并获得一些数据

您必须使用JSON=payload将其作为JSON发送

如果您必须使用cookie和标头,那么就使用会话,这样您就不必将cookie和标头从一个请求复制到另一个请求

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36",
    #"Accept": "application/json, text/plain, */*",
    #"Accept-Encoding": "gzip, deflate, sdch, br",
    #"Accept-Language": "en-US,en;q=0.8,lt;q=0.6,ru;q=0.4",
}

s = requests.Session()
s.headers.update(headers)

# to get cookies
r = s.get("https://www.wizzair.com/")

payload = {
    "flightList":[
        {
            "departureStation": "VNO",
            "arrivalStation": "FCO",
            "departureDate": "2017-02-20"
        }
    ],
    "adultCount": 1,
    "childCount": 0,
    "infantCount": 0,
    "wdc": True,
    "dayInterval": 3
}

url = 'https://be.wizzair.com/3.8.2/Api/search/search'

r = s.post(url, json=payload)

print(r.text)

data = r.json()

print(data['outboundFlights'][0]['flightNumber'])

顺便说一句:对于会话,您可以在开始时使用Session.header设置头,而不必在每个请求中都设置头。使用会话,您不必将cookies从一个请求复制到另一个请求,会话会自动执行。顺便说一句:在浏览器中尝试您的URL,您会看到这一点https://be.wizzair.com/3.8.2/Api/asset/culture 不适用于GET。抱歉,我测试了网络标签上的几个URL,并留下了错误的一个,但显然这不是这里的主要问题。它只是注释,而不是答案。是否有此API的文档?我怀疑是否有公共API文档。我使用Chrome的devtools网络选项卡找到了它。
import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36",
    #"Accept": "application/json, text/plain, */*",
    #"Accept-Encoding": "gzip, deflate, sdch, br",
    #"Accept-Language": "en-US,en;q=0.8,lt;q=0.6,ru;q=0.4",
}

s = requests.Session()
s.headers.update(headers)

# to get cookies
r = s.get("https://www.wizzair.com/")

payload = {
    "flightList":[
        {
            "departureStation": "VNO",
            "arrivalStation": "FCO",
            "departureDate": "2017-02-20"
        }
    ],
    "adultCount": 1,
    "childCount": 0,
    "infantCount": 0,
    "wdc": True,
    "dayInterval": 3
}

url = 'https://be.wizzair.com/3.8.2/Api/search/search'

r = s.post(url, json=payload)

print(r.text)

data = r.json()

print(data['outboundFlights'][0]['flightNumber'])