Python 维扎尔刮削
我正试着刮WizzAir以备个人使用。无法理解我的代码有什么问题。可能是不正确的有效负载对象或cookie吗Python 维扎尔刮削,python,web-scraping,python-requests,Python,Web Scraping,Python Requests,我正试着刮WizzAir以备个人使用。无法理解我的代码有什么问题。可能是不正确的有效负载对象或cookie吗 import requests headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36", "Accept": "applicati
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36",
"Accept": "application/json, text/plain, */*",
"Accept-Encoding": "gzip, deflate, sdch, br",
"Accept-Language": "en-US,en;q=0.8,lt;q=0.6,ru;q=0.4",
"Origin": "https://wizzair.com",
"Referer": "https://wizzair.com/"
}
search_url = "https://wizzair.com/lt-LT/FlightSearch"
session = requests.Session()
r = session.get("https://be.wizzair.com/3.8.2/Api/asset/yellowRibbon", headers=headers, allow_redirects=False)
session_id = r.cookies["ASP.NET_SessionId"]
cookies = {
"ASP.NET_SessionId": session_id,
"HomePageSelector": "FlightSearch",
}
# wizz_url = "https://be.wizzair.com/3.8.2/Api/search/search"
wizz_url = "https://be.wizzair.com/3.8.2/Api/asset/farechart"
payload = {"flightList":[{"departureStation":"VNO","arrivalStation":"FCO","departureDate":"2017-02-20"}],"adultCount":1,"childCount":0,"infantCount":0,"wdc":True, "dayInterval":3}
r = session.post(url=wizz_url,data=payload,headers=headers, cookies=cookies)
print r.content
>>> {"validationCodes":["FlightCount_MustBe_OneOrTwo"]}
我运行这个-即使没有会话和cookie-并获得一些数据 您必须使用JSON=payload将其作为JSON发送 如果您必须使用cookie和标头,那么就使用会话,这样您就不必将cookie和标头从一个请求复制到另一个请求
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36",
#"Accept": "application/json, text/plain, */*",
#"Accept-Encoding": "gzip, deflate, sdch, br",
#"Accept-Language": "en-US,en;q=0.8,lt;q=0.6,ru;q=0.4",
}
s = requests.Session()
s.headers.update(headers)
# to get cookies
r = s.get("https://www.wizzair.com/")
payload = {
"flightList":[
{
"departureStation": "VNO",
"arrivalStation": "FCO",
"departureDate": "2017-02-20"
}
],
"adultCount": 1,
"childCount": 0,
"infantCount": 0,
"wdc": True,
"dayInterval": 3
}
url = 'https://be.wizzair.com/3.8.2/Api/search/search'
r = s.post(url, json=payload)
print(r.text)
data = r.json()
print(data['outboundFlights'][0]['flightNumber'])
顺便说一句:对于会话,您可以在开始时使用Session.header设置头,而不必在每个请求中都设置头。使用会话,您不必将cookies从一个请求复制到另一个请求,会话会自动执行。顺便说一句:在浏览器中尝试您的URL,您会看到这一点https://be.wizzair.com/3.8.2/Api/asset/culture 不适用于GET。抱歉,我测试了网络标签上的几个URL,并留下了错误的一个,但显然这不是这里的主要问题。它只是注释,而不是答案。是否有此API的文档?我怀疑是否有公共API文档。我使用Chrome的devtools网络选项卡找到了它。
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36",
#"Accept": "application/json, text/plain, */*",
#"Accept-Encoding": "gzip, deflate, sdch, br",
#"Accept-Language": "en-US,en;q=0.8,lt;q=0.6,ru;q=0.4",
}
s = requests.Session()
s.headers.update(headers)
# to get cookies
r = s.get("https://www.wizzair.com/")
payload = {
"flightList":[
{
"departureStation": "VNO",
"arrivalStation": "FCO",
"departureDate": "2017-02-20"
}
],
"adultCount": 1,
"childCount": 0,
"infantCount": 0,
"wdc": True,
"dayInterval": 3
}
url = 'https://be.wizzair.com/3.8.2/Api/search/search'
r = s.post(url, json=payload)
print(r.text)
data = r.json()
print(data['outboundFlights'][0]['flightNumber'])