Ruby 使用Mechanize登录Booking.com

Ruby 使用Mechanize登录Booking.com,ruby,web-scraping,mechanize,Ruby,Web Scraping,Mechanize,我尝试通过以下URL使用Mechanize登录Booking.com: 到目前为止,我不可能通过登录过程。恐怕他们在发送表单时使用javascript函数来设置csrf_令牌。以下是我使用的代码: login_url = "https://admin.booking.com/hotel/hoteladmin" agent = Mechanize.new agent.user_agent_alias = 'Mac Safari' agent.verify_mode= OpenSSL::SSL::V

我尝试通过以下URL使用Mechanize登录Booking.com:

到目前为止,我不可能通过登录过程。恐怕他们在发送表单时使用javascript函数来设置csrf_令牌。以下是我使用的代码:

login_url = "https://admin.booking.com/hotel/hoteladmin"
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
agent.verify_mode= OpenSSL::SSL::VERIFY_NONE

# Get the login page
page = agent.get(login_url)

form = page.form_with(:name => 'myform')
form.loginname = my_username
form.password = my_password
form.add_field!("csrf_token", "empty-token")

# Submit the form
page = form.submit( form.button_with(:name => "Login") )
当我在浏览器上加载页面时,我得到:

var token = '..................EXTRA-LONG-TOKEN..................' || 'empty-token',
但当我使用Mechanize检查时,我得到:

var token = '' || 'empty-token',
请使用Mechanize查找完整的页面正文


所以他们使用javascript在我们提交表单时创建的新字段中设置这个变量

if (
    form &&
    form.method &&
    form.method.toLowerCase() === 'post' &&
    typeof form.elements.csrf_token === 'undefined'
) {
    input       =  doc.createElement( 'input' );
    input.name  = 'csrf_token';
    input.type  = 'hidden';
    input.value =  token;

    form.appendChild( input );
}

我还尝试在网络选项卡中查看Firebug,但没有成功。当我们提交表格时,有以下顺序:

302 - POST - login.html
302 - GET  - https://admin.booking.com/hotel/hoteladmin/index-hotel.html?page=&lang=xu&ses=89abb0da735818bc6252d69ece255276&t=1429195712.93074
302 - GET  - https://admin.booking.com/hotel/hoteladmin/extranet_ng/manage/index.html?lang=xu&ses=89abb0da735818bc6252d69ece255276&hotel_id=XXXXXX&t=1429195713.11779
200 - GET  - /home.html
当我检查POST请求时,我可以在“请求数据”中看到:

所以,我不知道上面的csrf_令牌是否被使用,也不知道它在哪里。我不知道是不是csrf_令牌阻止了我登录


以下是我的浏览器中成功登录的请求/响应标题:

---------- Request ----------
Host: admin.booking.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: https://admin.booking.com/hotel/hoteladmin/login.html
Cookie: cwd-extranet=1; ecid=RtSy3w%2Fk5BG5Z67OY8E2rQZz; slan=xu; auth_token=569054884; ut=e; _ga=GA1.2.357900853.1429171802
Connection: keep-alive
---------- Response ----------
Connection: keep-alive
Content-Type: text/html; charset=UTF-8
Date: Thu, 16 Apr 2015 14:57:24 GMT
Location: /hotel/hoteladmin/index-hotel.html?page=&lang=xu&ses=8df70f6f7699cf5c5d63271fbbb47bb1&t=1429196244.67621
Server: nginx
Set-Cookie: cwd-extranet=1; path=/; expires=Tue, 14-Apr-2020 14:57:24 GMT
slan=xu; path=/; expires=Wed, 18-May-2033 03:33:20 GMT; HttpOnly
Strict-Transport-Security: max-age=2592000
Transfer-Encoding: chunked
这是Mechanize的标题,登录失败(响应标题上没有位置?):


感谢您的帮助

我在不处理CSRF令牌的情况下成功解决了问题

我所做的是遵循Firebug的POST/GET序列,只有登录表单(隐藏)上可以找到的SES令牌才是重要的

因此,对于登录帖子,我们有:

uri = URI.parse("https://admin.booking.com/hotel/hoteladmin/login.html")
data = URI.encode("lang=en&login=Login&ses=#{token}&loginname=#{username}&password=#{password}")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Post.new(uri.request_uri)
request.body = data
request['Cookie'] = cookie
response = http.request(request)
cookie = response.response['set-cookie']
location = response.response['location']
然后我们使用前面的
cookie
&
location
进行重定向,直到我们得到一个200响应代码,类似于:

uri = URI.parse("https://admin.booking.com#{location}")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Get.new(uri.request_uri)
request['Cookie'] = cookie
response = http.request(request)
cookie = response.response['set-cookie']
location = response.response['location']
uri = URI.parse("https://admin.booking.com/hotel/hoteladmin/login.html")
data = URI.encode("lang=en&login=Login&ses=#{token}&loginname=#{username}&password=#{password}")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Post.new(uri.request_uri)
request.body = data
request['Cookie'] = cookie
response = http.request(request)
cookie = response.response['set-cookie']
location = response.response['location']
uri = URI.parse("https://admin.booking.com#{location}")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Get.new(uri.request_uri)
request['Cookie'] = cookie
response = http.request(request)
cookie = response.response['set-cookie']
location = response.response['location']