python/requests的问题抓取

python/requests的问题抓取,python,web,python-requests,screen-scraping,Python,Web,Python Requests,Screen Scraping,我正试图(一半是为了教育目的,一半是为了自己监控票价)在联合航空公司的网站上搜集特定航班的价格数据 我已经用selenium成功地实现了这一点,但这是一个相当笨拙的实现,在这个过程中,我注意到在初始重定向之后有一个ajax调用,它对我想要的一切都有一个很好的JSON响应。我试图通过传递我在dev工具的network选项卡中看到的适当的post参数来直接命中端点,但它不起作用。然后我注意到有一个“cart id”字段,它看起来是动态的,而另一个看起来是静态的,所以我从pre-redirect fo

我正试图(一半是为了教育目的,一半是为了自己监控票价)在联合航空公司的网站上搜集特定航班的价格数据

我已经用selenium成功地实现了这一点,但这是一个相当笨拙的实现,在这个过程中,我注意到在初始重定向之后有一个ajax调用,它对我想要的一切都有一个很好的JSON响应。我试图通过传递我在dev工具的network选项卡中看到的适当的post参数来直接命中端点,但它不起作用。然后我注意到有一个“cart id”字段,它看起来是动态的,而另一个看起来是静态的,所以我从pre-redirect form submission页面中取出它并将其插入到帖子中,但我得到的回复仍然是一个状态:fail“很抱歉,nited.com无法完成您的请求”

我不确定我在这篇文章中遗漏了什么数据。我还首先点击表单提交页面,以便设置带有持久会话对象的cookie,我认为这会有所帮助,但没有骰子。我错过了什么?通过在浏览器中导航到下面的第一个URL,查看网络选项卡,您可以看到我正在寻找的实际响应,第一个名为“rev”的xhr包含我试图模仿的发布表单数据以及我想要的JSON

with requests.session() as s:
    formsubmitpage = s.get('https://www.united.com/ual/en/us/flight-search/book-a-flight/results/rev?f=sfo&t=tpe&d=2016-01-20&r=2016-01-26&sc=1,7&px=1&taxng=1&idx=1')
    doc = html.fromstring(formsubmitpage.text)
    cartid = doc.xpath('//a[@class="no-rtad"]/@data-cartid')[0]
    print(cartid)
    params = {"Revise":False,"UnaccompaniedMinorDisclamer":False,"ConfirmationID":None,"searchTypeMain":"roundTrip","Origin":"sfo","Destination":"tpe","DepartDate":"Jan 20, 2016","ReturnDate":"Jan 26, 2016","awardTravel":False,"MaxTrips":None,"numberOfTravelers":1,"numOfAdults":1,"numOfSeniors":0,"numOfChildren04":0,"numOfChildren03":0,"numOfChildren02":0,"numOfChildren01":0,"numOfInfants":0,"numOfLapInfants":0,"travelerCount":1,"revisedTravelerKeys":None,"revisedTravelers":None,"OriginalReservation":None,"RiskFreePolicy":None,"IsUnAccompaniedMinor":False,"MilitaryTravelType":None,"MilitaryOrGovernmentPersonnelStateCode":None,"tripLength":6,"IsParallelFareWheelCallEnabled":False,"flexMonth":None,"flexMonth2":None,"SortType":None,"cboMiles":None,"cboMiles2":None,"Trips":[{"DestinationAll":False,"returnARC":None,"connections":None,"nonStopOnly":True,"nonStop":True,"oneStop":False,"twoPlusStop":False,"ChangeType":0,"DepartDate":"Jan 20, 2016","ReturnDate":None,"PetIsTraveling":False,"PreferredTime":"","PreferredTimeReturn":None,"Destination":"TPE","Index":1,"Origin":"SFO","Selected":False,"FormatedDepartDate":"Wed, Jan 20, 2016","OriginCorrection":None,"DestinationCorrection":None,"OriginAll":False,"Flights":None},{"DestinationAll":False,"returnARC":None,"connections":None,"nonStopOnly":True,"nonStop":True,"oneStop":False,"twoPlusStop":False,"ChangeType":0,"DepartDate":"Jan 26, 2016","ReturnDate":None,"PetIsTraveling":False,"PreferredTime":"","PreferredTimeReturn":None,"Destination":"SFO","Index":2,"Origin":"TPE","Selected":False,"FormatedDepartDate":"Tue, Jan 26, 2016","OriginCorrection":None,"DestinationCorrection":None,"OriginAll":False,"Flights":None}],"nonStopOnly":1,"CalendarOnly":False,"InitialShop":True,"IsSearchInjection":False,"CartId":cartid,"CellIdSelected":None,"BBXSession":None,"SolutionSetId":None,"SimpleSearch":True,"RequeryForUpsell":False,"RequeryForPOSChange":False,"YBMAlternateService":False,"ShowClassOfServiceListPreference":False,"SelectableUpgradesOriginal":None,"RegionalPremierUpgradeBalance":0,"GlobalPremierUpgradeBalance":0,"RegionalPremierUpgrades":None,"GlobalPremierUpgrades":None,"FormattedAccountBalance":None,"GovType":None,"TripTypes":0,"flexible":False,"flexibleAward":False,"FlexibleDaysAfter":0,"FlexibleDaysBefore":0,"hiddenPreferredConn":None,"hiddenUnpreferredConn":None,"carrierPref":0,"chkFltOpt":0,"portOx":0,"travelwPet":0,"NumberOfPets":0,"cabinType":0,"cabinSelection":"ECONOMY","awardCabinType":0,"FareTypes":0,"FareWheelOnly":False,"EditSearch":False,"buyUpgrade":0,"offerCode":None,"TVAOfferCodeLastName":None,"ClassofService":None,"UpgradeType":None,"BillingAddressCountryCode":None,"BillingAddressCountryDescription":None,"IsPassPlusFlex":False,"IsPassPlusSecure":False,"IsOffer":False,"IsMeetingWorks":False,"IsValidPromotion":False,"CalendarDateChange":None,"CoolAwardSpecials":False,"LastResultId":None,"IncludeLmx":False,"NGRP":False,"calendarStops":0,"isReshopPath":False}
    redirect_endpoint = s.post('https://www.united.com/ual/en/us/flight-search/book-a-flight/flightshopping/getflightresults/rev',data=json.dumps(params))
    print(redirect_endpoint.text)#denied!

s.post
调用中,您可能是指

data=params
(发送表单数据)或:

(将JSON作为请求主体发送)

params
键用于查询字符串


请参阅。

s.post
呼叫中,您可能是指

data=params
(发送表单数据)或:

(将JSON作为请求主体发送)

params
键用于查询字符串


看到了。

是的,我在发布之后就注意到。。。将其更改为第一个(data=params),但仍然没有运气。这解决了“权限被拒绝”问题,但响应的状态仍然失败。请使用
json=params
,而不是使用
data=json.dumps(params)
。它将正确设置内容类型标题。不幸的是,这没有什么区别,更不用说我撒谎了。我仍然在使用json.dumps。如果使用json而不是数据,则请求必须为您进行转换并指定头。奇怪的是为什么使用data=json.dumps(params)不起作用?可能是标题?是的,在上面链接的文档中。当使用
json=params
时,请求会将
内容类型
标题设置为
application/json
,而如果使用
data=json.dumps(params)
则不能,因为请求只是获取一个
str
,并且不知道它包含json。你也可以自己设置这个标题,如果你愿意的话,通过构建一个
Request
对象,而不是使用
post()
便利函数。。。将其更改为第一个(data=params),但仍然没有运气。这解决了“权限被拒绝”问题,但响应的状态仍然失败。请使用
json=params
,而不是使用
data=json.dumps(params)
。它将正确设置内容类型标题。不幸的是,这没有什么区别,更不用说我撒谎了。我仍然在使用json.dumps。如果使用json而不是数据,则请求必须为您进行转换并指定头。奇怪的是为什么使用data=json.dumps(params)不起作用?可能是标题?是的,在上面链接的文档中。当使用
json=params
时,请求会将
内容类型
标题设置为
application/json
,而如果使用
data=json.dumps(params)
则不能,因为请求只是获取一个
str
,并且不知道它包含json。如果需要,您也可以自己设置该头,方法是构建一个
Request
对象,而不是使用
post()
便利函数。