Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/ruby-on-rails/53.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby on rails 使用OpenUri,如何获取重定向页面的内容?_Ruby On Rails_Redirect_Open Uri - Fatal编程技术网

Ruby on rails 使用OpenUri,如何获取重定向页面的内容?

Ruby on rails 使用OpenUri,如何获取重定向页面的内容?,ruby-on-rails,redirect,open-uri,Ruby On Rails,Redirect,Open Uri,我想从此页获取数据: http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?trackingNumber=0656887000494793 但该页面将转发至: http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?execution=eXs1 因此,当我从OpenUri使用open尝试获取数据时,它抛出一个Runt

我想从此页获取数据:

http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?trackingNumber=0656887000494793
但该页面将转发至:

http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?execution=eXs1
因此,当我从OpenUri使用
open
尝试获取数据时,它抛出一个
RuntimeError
错误,表示
HTTP重定向循环:


我真的不确定在重定向并抛出错误后如何获取该数据。

该站点似乎正在对会话执行一些重定向逻辑。如果您没有发回他们在第一次请求时发送的会话cookie,您将进入重定向循环。对他们来说,这是一个糟糕的实现

然而,我试图将cookies传递给他们,但我没有让它工作,所以我不能完全确定这就是这里发生的一切。

您需要一个类似的工具。根据它的描述:

Mechanize库用于 自动化与网站的交互。 自动机械化存储和存储 发送cookies,跟踪重定向,可以 点击链接,提交表格。形式 可以填充和提交字段。 Mechanize还跟踪 作为用户访问过的站点 历史

这正是你需要的。所以

sudo gem install mechanize
然后


你已经准备好摇滚了。

虽然机械化是一个很棒的工具,但我更喜欢“烹饪”我自己的东西

如果你是认真的,你可以看看这段代码。它可以每天在国际水平上爬行数千个站点,据我所研究和调整,没有一种更稳定的方法可以让你以后根据需要进行高度定制

require "open-uri"
require "zlib"
require "nokogiri"
require "sanitize"
require "htmlentities"
require "readability"

def crawl(url_address)
self.errors = Array.new
begin
  begin
    url_address = URI.parse(url_address)
  rescue URI::InvalidURIError
    url_address = URI.decode(url_address)
    url_address = URI.encode(url_address)
    url_address = URI.parse(url_address)
  end
  url_address.normalize!
  stream = ""
  timeout(8) { stream = url_address.open(SHINSO_HEADERS) }
  if stream.size > 0
    url_crawled = URI.parse(stream.base_uri.to_s)
  else
    self.errors << "Server said status 200 OK but document file is zero bytes."
    return
  end
rescue Exception => exception
  self.errors << exception
  return
end
# extract information before html parsing
self.url_posted       = url_address.to_s
self.url_parsed       = url_crawled.to_s
self.url_host         = url_crawled.host
self.status           = stream.status
self.content_type     = stream.content_type
self.content_encoding = stream.content_encoding
self.charset          = stream.charset
if    stream.content_encoding.include?('gzip')
  document = Zlib::GzipReader.new(stream).read
elsif stream.content_encoding.include?('deflate')
  document = Zlib::Deflate.new().deflate(stream).read
#elsif stream.content_encoding.include?('x-gzip') or
#elsif stream.content_encoding.include?('compress')
else
  document = stream.read
end
self.charset_guess = CharGuess.guess(document)
if not self.charset_guess.blank? and (not self.charset_guess.downcase == 'utf-8' or not self.charset_guess.downcase == 'utf8')
  document = Iconv.iconv("UTF-8", self.charset_guess, document).to_s
end
document = Nokogiri::HTML.parse(document,nil,"utf8")
document.xpath('//script').remove
document.xpath('//SCRIPT').remove
for item in document.xpath('//*[translate(@src, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz")]')
  item.set_attribute('src',make_absolute_address(item['src']))
end
document = document.to_s.gsub(/<!--(.|\s)*?-->/,'')
self.content = Nokogiri::HTML.parse(document,nil,"utf8")
end
需要“打开uri”
需要“zlib”
需要“nokogiri”
需要“消毒”
需要“htmlentities”
要求“可读性”
def爬网(url_地址)
self.errors=Array.new
开始
开始
url\u address=URI.parse(url\u地址)
rescue URI::InvalidURIError
url\u地址=URI.decode(url\u地址)
url\u地址=URI.encode(url\u地址)
url\u address=URI.parse(url\u地址)
结束
url\u address.normalize!
stream=“”
超时(8){stream=url\u address.open(SHINSO\u头)}
如果stream.size>0
url\u crawled=URI.parse(stream.base\u URI.to\s)
其他的
self.errors异常

self.errors是的,这就是我要问的…因为这是一个重定向,我如何从它重定向到的页面获取数据?我已经重新表述了我的答案,以使我的观点更清楚。我不仅仅是说这是一个重定向,我还解释了为什么你最终陷入了一个循环,希望现在应该明白了。开放uri是强制性的吗?你对另一种ruby技术是否满意?如果必要的话,另一种ruby技术肯定是可以的。
openURI
已经处理了重定向。它只是在遇到重定向循环时出错
require "open-uri"
require "zlib"
require "nokogiri"
require "sanitize"
require "htmlentities"
require "readability"

def crawl(url_address)
self.errors = Array.new
begin
  begin
    url_address = URI.parse(url_address)
  rescue URI::InvalidURIError
    url_address = URI.decode(url_address)
    url_address = URI.encode(url_address)
    url_address = URI.parse(url_address)
  end
  url_address.normalize!
  stream = ""
  timeout(8) { stream = url_address.open(SHINSO_HEADERS) }
  if stream.size > 0
    url_crawled = URI.parse(stream.base_uri.to_s)
  else
    self.errors << "Server said status 200 OK but document file is zero bytes."
    return
  end
rescue Exception => exception
  self.errors << exception
  return
end
# extract information before html parsing
self.url_posted       = url_address.to_s
self.url_parsed       = url_crawled.to_s
self.url_host         = url_crawled.host
self.status           = stream.status
self.content_type     = stream.content_type
self.content_encoding = stream.content_encoding
self.charset          = stream.charset
if    stream.content_encoding.include?('gzip')
  document = Zlib::GzipReader.new(stream).read
elsif stream.content_encoding.include?('deflate')
  document = Zlib::Deflate.new().deflate(stream).read
#elsif stream.content_encoding.include?('x-gzip') or
#elsif stream.content_encoding.include?('compress')
else
  document = stream.read
end
self.charset_guess = CharGuess.guess(document)
if not self.charset_guess.blank? and (not self.charset_guess.downcase == 'utf-8' or not self.charset_guess.downcase == 'utf8')
  document = Iconv.iconv("UTF-8", self.charset_guess, document).to_s
end
document = Nokogiri::HTML.parse(document,nil,"utf8")
document.xpath('//script').remove
document.xpath('//SCRIPT').remove
for item in document.xpath('//*[translate(@src, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz")]')
  item.set_attribute('src',make_absolute_address(item['src']))
end
document = document.to_s.gsub(/<!--(.|\s)*?-->/,'')
self.content = Nokogiri::HTML.parse(document,nil,"utf8")
end