Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/25.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby 抓取URL列表并绕过没有DNS的URL_Ruby_Dns_Web Crawler - Fatal编程技术网

Ruby 抓取URL列表并绕过没有DNS的URL

Ruby 抓取URL列表并绕过没有DNS的URL,ruby,dns,web-crawler,Ruby,Dns,Web Crawler,我正在用Ruby抓取一大串URL,但我所有的URL都不是活动的,也没有与DNS关联。当我点击那个url时,我的爬虫程序出错了 require 'rubygems' require 'nokogiri' require 'open-uri' require 'net/http' require 'colorize' URL_LIST = [ 'http://website.com', 'http://website.net' ] URL_LIST.each do |url| ite

我正在用Ruby抓取一大串URL,但我所有的URL都不是活动的,也没有与DNS关联。当我点击那个url时,我的爬虫程序出错了

require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'net/http'
require 'colorize'

URL_LIST = [
  'http://website.com',
  'http://website.net'
]

URL_LIST.each do |url|
  item = "#{url}"
  resp = Net::HTTP.get_response(URI.parse(item))

  case resp.code.to_i
  when 200
    puts "Success: #{url}".green
  when 301..303
    new_url = resp['location']
    puts "Redirect #{url} => #{new_url}".yellow
  else
    resp.code
  end
end
当我运行此脚本并点击错误url时,我收到如下错误:

/Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:879:in `initialize': getaddrinfo: nodename nor servname provided, or not known (SocketError)
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:879:in `open'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:879:in `block in connect'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/timeout.rb:76:in `timeout'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:878:in `connect'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:863:in `do_start'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:852:in `start'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:583:in `start'
from /Users/<name>/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:478:in `get_response'
from spider.rb:808:in `block in <main>'
from spider.rb:806:in `each'
from spider.rb:806:in `<main>'
/Users/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:879:in'initialize':getaddrinfo:nodename或servname已提供,或未知(SocketError)
from/Users/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:879:in'open'
from/Users//.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:879:“连接中的块”中
from/Users/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/timeout.rb:76:in'timeout'
from/Users/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:878:in'connect'
from/Users//.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:863:in'do_start'
from/Users/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:852:in'start'
from/Users/.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:583:in'start'
from/Users//.rvm/rubies/ruby-2.1.1/lib/ruby/2.1.0/net/http.rb:478:in'get_response'
来自spider.rb:808:in'block in'
来自spider.rb:806:in'each'
来自spider.rb:806:in`'

使用begin/rescue(开始/救援)块救援错误并以红色输出错误信息:

URL_LIST = [
  'http://website.com',
  'http://sdfasdfwqeasdfasdfr.com',
  'http://website.net'
]

URL_LIST.each do |url|
  item = "#{url}"

  begin
    resp = Net::HTTP.get_response(URI.parse(item))

    case resp.code.to_i
    when 200
      puts "Success: #{url}".green
    when 301..303
      new_url = resp['location']
      puts "Redirect #{url} => #{new_url}".yellow
    else
      resp.code
    end
  rescue SocketError => e
    puts "Error: #{url} - #{e}".red
  end
end
输出将如下所示:

Redirect http://website.com => http://www.website.com/
Error: http://sdfasdfwqeasdfasdfr.com - getaddrinfo: nodename nor servname provided, or not known
Success: http://website.net

使用begin/rescue(开始/救援)块救援错误,并以红色输出错误信息:

URL_LIST = [
  'http://website.com',
  'http://sdfasdfwqeasdfasdfr.com',
  'http://website.net'
]

URL_LIST.each do |url|
  item = "#{url}"

  begin
    resp = Net::HTTP.get_response(URI.parse(item))

    case resp.code.to_i
    when 200
      puts "Success: #{url}".green
    when 301..303
      new_url = resp['location']
      puts "Redirect #{url} => #{new_url}".yellow
    else
      resp.code
    end
  rescue SocketError => e
    puts "Error: #{url} - #{e}".red
  end
end
输出将如下所示:

Redirect http://website.com => http://www.website.com/
Error: http://sdfasdfwqeasdfasdfr.com - getaddrinfo: nodename nor servname provided, or not known
Success: http://website.net

在Ruby 1.9+中,您不需要使用
require'rubygems'
,因为它现在是内置的。如果您想进行飞行前检查,可以使用
主机www.example.com
进行操作系统检查,查看FQDN是否解析为真实地址
item=“#{url}”
正在浪费CPU,因为
url
已经是一个字符串。只需使用
resp=Net::HTTP.get\u response(URI.parse(url))
。我很感谢您的提醒。我做了适当的修改。使用Ruby 1.9+时,您不需要使用
require'rubygems'
,因为它现在是内置的。如果您想进行飞行前检查,可以使用
主机www.example.com
进行操作系统检查,查看FQDN是否解析为真实地址
item=“#{url}”
正在浪费CPU,因为
url
已经是一个字符串。只需使用
resp=Net::HTTP.get\u response(URI.parse(url))
。我很感谢您的提醒。我做了适当的修改。这太完美了!谢谢。这太完美了!非常感谢。