Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/24.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby 将Mechanize与Google文档一起使用_Ruby_Screen Scraping_Mechanize - Fatal编程技术网

Ruby 将Mechanize与Google文档一起使用

Ruby 将Mechanize与Google文档一起使用,ruby,screen-scraping,mechanize,Ruby,Screen Scraping,Mechanize,我正在尝试使用Mechanize登录到Google文档,这样我就可以从API中抓取一些东西(这是不可能的),但在尝试遵循meta重定向时,我似乎一直得到404: require 'rubygems' require 'mechanize' USERNAME = "..." PASSWORD = "..." LOGIN_URL = "https://www.google.com/accounts/Login?hl=en&continue=http://docs.google.com/"

我正在尝试使用Mechanize登录到Google文档,这样我就可以从API中抓取一些东西(这是不可能的),但在尝试遵循meta重定向时,我似乎一直得到404:

require 'rubygems'
require 'mechanize'

USERNAME = "..."
PASSWORD = "..."

LOGIN_URL = "https://www.google.com/accounts/Login?hl=en&continue=http://docs.google.com/"

agent = Mechanize.new
login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = USERNAME
login_form.Passwd = PASSWORD
login_response_page = agent.submit(login_form)

redirect = login_response_page.meta[0].uri.to_s

puts "redirect: #{redirect}"

followed_page = agent.get(redirect) # throws a HTTPNotFound exception

pp followed_page
有人知道为什么这样不行吗?

安迪,你太棒了!! 你的代码帮助我使我的脚本可行,并登录到谷歌帐户。几个小时后我发现了你的错误,是关于html转义的。正如我发现的,Mechanize自动转义它作为“get”方法的参数接收的uri。因此,我的解决方案是:

EMAIL  = ".."
PASSWD = ".."
agent = Mechanize.new{ |a| a.log = Logger.new("mech.log")}
agent.user_agent_alias = 'Linux Mozilla'
agent.open_timeout = 3
agent.read_timeout = 4
agent.keep_alive   = true
agent.redirect_ok  = true
LOGIN_URL = "https://www.google.com/accounts/Login?hl=en"

login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = EMAIL
login_form.Passwd = PASSWD
login_response_page = agent.submit(login_form)

redirect = login_response_page.meta[0].uri.to_s

puts redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/"
followed_page = agent.get(redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/adplanner")
pp followed_page
这对我来说很好。我已经用新的替换了meta标记(已经转义)中的continue参数。

安迪,你太棒了!! 你的代码帮助我使我的脚本可行,并登录到谷歌帐户。几个小时后我发现了你的错误,是关于html转义的。正如我发现的,Mechanize自动转义它作为“get”方法的参数接收的uri。因此,我的解决方案是:

EMAIL  = ".."
PASSWD = ".."
agent = Mechanize.new{ |a| a.log = Logger.new("mech.log")}
agent.user_agent_alias = 'Linux Mozilla'
agent.open_timeout = 3
agent.read_timeout = 4
agent.keep_alive   = true
agent.redirect_ok  = true
LOGIN_URL = "https://www.google.com/accounts/Login?hl=en"

login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = EMAIL
login_form.Passwd = PASSWD
login_response_page = agent.submit(login_form)

redirect = login_response_page.meta[0].uri.to_s

puts redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/"
followed_page = agent.get(redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/adplanner")
pp followed_page
这对我来说很好。我已经用新的替换了meta标记(已经转义)中的continue参数