Ruby 404未找到,但可以通过web浏览器正常访问
我在这上面尝试了许多URL,它们似乎都很好,直到我遇到了这一个:Ruby 404未找到,但可以通过web浏览器正常访问,ruby,http-status-code-404,nokogiri,open-uri,Ruby,Http Status Code 404,Nokogiri,Open Uri,我在这上面尝试了许多URL,它们似乎都很好,直到我遇到了这一个: require 'rubygems' require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML(open("http://www.moxyst.com/fashion/men-clothing/underwear.html")) puts doc 结果是: /Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2
require 'rubygems'
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("http://www.moxyst.com/fashion/men-clothing/underwear.html"))
puts doc
结果是:
/Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/open-uri.rb:353:in `open_http': 404 Not Found (OpenURI::HTTPError)
from /Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/open-uri.rb:709:in `buffer_open'
from /Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/open-uri.rb:210:in `block in open_loop'
from /Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/open-uri.rb:208:in `catch'
from /Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/open-uri.rb:208:in `open_loop'
from /Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/open-uri.rb:149:in `open_uri'
from /Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/open-uri.rb:689:in `open'
from /Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/open-uri.rb:34:in `open'
from test.rb:5:in `<main>'
/Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/open uri.rb:353:在“open\u http”中:404未找到(OpenURI::HTTPError)
from/Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/openuri.rb:709:in'buffer\u open'
from/Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/open-uri.rb:210:在“开环中的块”中
from/Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/openuri.rb:208:in'catch'
from/Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/openuri.rb:208:在“开环”中
from/Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/openuri.rb:149:in'open_uri'
from/Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/openuri.rb:689:in'open'
from/Users/macbookair/.rvm/rubies/ruby-2.0.0-p481/lib/ruby/2.0.0/openuri.rb:34:in'open'
来自test.rb:5:in`'
我可以通过网络浏览器访问它,但我一点也不懂
发生了什么,我如何处理这种错误?我可以忽略它,让其他人做他们的工作吗
那么到底发生了什么,我该如何处理这种错误呢
不知道发生了什么,但是你可以通过捕捉错误来处理它
begin
doc = Nokogiri::HTML(open("http://www.moxyst.com/fashion/men-clothing/underwear.html"))
puts doc
rescue => e
puts "I failed: #{e}"
end
我可以忽略它,让其他人做他们的工作吗
当然!大概不确定。我们不知道您的要求。您将得到
404未找到(OpenURI::HTTPError)
,因此,如果您希望允许代码继续,请针对该异常进行救援。像这样的方法应该会奏效:
require 'nokogiri'
require 'open-uri'
URLS = %w[
http://www.moxyst.com/fashion/men-clothing/underwear.html
]
URLs.each do |url|
begin
doc = Nokogiri::HTML(open(url))
rescue OpenURI::HTTPError => e
puts "Can't access #{ url }"
puts e.message
puts
next
end
puts doc.to_html
end
您可以使用更通用的异常,但在获得奇怪的输出时会遇到问题,或者可能会以导致更多问题的方式处理不相关的问题,因此您需要确定所需的粒度
您甚至可以嗅探HTTPd头、响应的状态,或者查看异常消息,如果您想要更多的控制,并且想要为401或404做一些不同的事情
我可以通过网络浏览器访问它,但我一点也不懂
嗯,这可能是服务器端发生的事情:也许他们不喜欢您发送的UserAgent字符串?显示了如何更改该标题:
附加的头字段可以由可选的散列参数指定
open("http://www.ruby-lang.org/en/",
"User-Agent" => "Ruby/#{RUBY_VERSION}",
"From" => "foo@bar.invalid",
"Referer" => "http://www.ruby-lang.org/") {|f|
# ...
}
您可能需要将“用户代理”作为参数传递给open方法。有些站点需要有效的用户代理,否则它们根本不响应或显示404未找到错误
doc = Nokogiri::HTML(open("http://www.moxyst.com/fashion/men-clothing/underwear.html", "User-Agent" => "MyCrawlerName (http://mycrawler-url.com)"))
您使用的是Ruby 2+,因此没有必要使用
要求“rubygems”
。这一要求在Ruby 1.9中消失了。但发生在我身上的是,我发现下一个是无效的