Ruby 使用Watir检查坏链接_Ruby_Watir

Ruby 使用Watir检查坏链接

ruby

Ruby 使用Watir检查坏链接,ruby,watir,Ruby,Watir,我有一个无序的链接列表，我保存在一边，我想点击每个链接，确保它进入一个真实的页面，而不是404500，等等问题是我不知道怎么做。是否有一些我可以检查的对象会给我http状态码或其他信息 mylinks = Browser.ul(:id, 'my_ul_id').links mylinks.each do |link| link.click # need to check for a 200 status or something here! how? Browser.back

我有一个无序的链接列表，我保存在一边，我想点击每个链接，确保它进入一个真实的页面，而不是404500，等等

问题是我不知道怎么做。是否有一些我可以检查的对象会给我http状态码或其他信息

mylinks = Browser.ul(:id, 'my_ul_id').links

mylinks.each do |link|
  link.click

  # need to check for a 200 status or something here! how?

  Browser.back
end

没有必要用Watir来做这个。HTTP

HEAD

请求会让您知道URL是否解析，并且会更快

Ruby的

Net:：HTTP

可以做到这一点，也可以使用

Open:：URI

使用Open:：URI，您可以请求一个URI，然后返回一个页面。因为您并不真正关心页面包含的内容，所以您可以扔掉该部分，只返回您是否获得了某些内容：

require 'open-uri'

if (open('http://www.example.com').read.any?)
  puts "is"
else
  puts "isn't"
end

好处是Open:：URI解析HTTP重定向。缺点是它会返回完整的页面，因此速度会很慢

Ruby的Net:：HTTP可以有所帮助，因为它可以使用HTTP

HEAD

请求，这些请求不返回整个页面，只返回一个header。这本身还不足以知道实际页面是否可访问，因为HEAD响应可能重定向到无法解析的页面，因此必须循环重定向，直到没有重定向或出现错误。Net:：HTTP文档有一个帮助您入门的方法：

require 'net/http'
require 'uri'

def fetch(uri_str, limit = 10)
  # You should choose better exception.
  raise ArgumentError, 'HTTP redirect too deep' if limit == 0

  response = Net::HTTP.get_response(URI.parse(uri_str))
  case response
  when Net::HTTPSuccess     then response
  when Net::HTTPRedirection then fetch(response['location'], limit - 1)
  else
    response.error!
  end
end

print fetch('http://www.ruby-lang.org')

同样，这个例子是返回页面，这可能会减慢您的速度。您可以将

get\u response

替换为，它返回类似

get\u response

does的响应，这应该会有所帮助

在这两种情况下，还有一件事你必须考虑。许多网站使用“”，这会导致浏览器在解析页面后使用备用URL刷新页面。处理这些需要请求页面并对其进行解析，查找

标记

其他HTTP gem喜欢并且也可以轻松地执行

HEAD

请求，所以也来看看它们。特别是，Typhous可以通过其同伴Hydra处理一些重负载，使您可以轻松使用并行请求

编辑：

以防你没有玩过，下面是你的反应。它对于您所看到的情况非常有用：

(rdb:1) pp response
#<Typhoeus::Response:0x00000100ac3f68
 @app_connect_time=0.0,
 @body="",
 @code=302,
 @connect_time=0.055054,
 @curl_error_message="No error",
 @curl_return_code=0,
 @effective_url="http://www.example.com",
 @headers=
  "HTTP/1.0 302 Found\r\nLocation: http://www.iana.org/domains/example/\r\nServer: BigIP\r\nConnection: Keep-Alive\r\nContent-Length: 0\r\n\r\n",
 @http_version=nil,
 @mock=false,
 @name_lookup_time=0.001436,
 @pretransfer_time=0.055058,
 @request=
  :method => :head,
    :url => http://www.example.com,
    :headers => {"User-Agent"=>"Typhoeus - http://github.com/dbalatero/typhoeus/tree/master"},
 @requested_http_method=nil,
 @requested_url=nil,
 @start_time=nil,
 @start_transfer_time=0.109741,
 @status_message=nil,
 @time=0.109822>

（rdb:1）pp响应
#：头，
：url=>http://www.example.com,
：headers=>{“用户代理”=>“typhous-http://github.com/dbalatero/typhoeus/tree/master"},
@请求的\u http\u方法=nil，
@请求的url=nil，
@开始时间=零，
@开始传输时间=0.109741，
@状态信息=nil，
@时间=0.109822>

如果您有很多URL要检查，请参阅Typhous的一部分

关于watir或watirwebdriver是否应该提供HTTP返回代码信息，存在着一些哲学上的争论。前提是一个普通的“用户”，即Watir在DOM上模拟的用户，不知道HTTP返回码。我不一定同意这一点，因为我的用例可能与主用例略有不同（性能测试等）。。。但事实就是这样。此线程表达了一些关于区别的观点=>

目前，如果不使用诸如proxies/Fiddler/HTTPWatch/TCPdump之类的补充工具，或者不降级到net/HTTP级别的脚本中期测试，就无法从Watir确定HTTP响应代码。。。我个人喜欢使用firebug和netexport插件来回顾测试。

我的答案与铁皮人的想法类似

require 'net/http' require 'uri' mylinks = Browser.ul(:id, 'my_ul_id').links mylinks.each do |link| u = URI.parse link.href status_code = Net::HTTP.start(u.host,u.port){|http| http.head(u.request_uri).code } # testing with rspec status_code.should == '200' end 需要“net/http” 需要“uri” mylinks=Browser.ul（：id，'my_ul_id'）。链接 mylinks.each do| link| u=URI.parse link.href status_code=Net:：HTTP.start（u.host，u.port）{| HTTP | HTTP.head（u.request_uri）.code} #rspec测试状态_code.should==“200” 结束如果您使用Test:：Unit来测试框架，我认为您可以进行如下测试

assert_equal '200',status_code 断言等于'200'，状态代码

另一个示例（包括Chuck van der Linden的想法）：检查状态代码，如果状态不好，则注销URL

require 'net/http' require 'uri' mylinks = Browser.ul(:id, 'my_ul_id').links mylinks.each do |link| u = URI.parse link.href status_code = Net::HTTP.start(u.host,u.port){|http| http.head(u.request_uri).code } unless status_code == '200' File.open('error_log.txt','a+'){|file| file.puts "#{link.href} is #{status_code}" } end end 需要“net/http” 需要“uri” mylinks=Browser.ul（：id，'my_ul_id'）。链接 mylinks.each do| link| u=URI.parse link.href status_code=Net:：HTTP.start（u.host，u.port）{| HTTP | HTTP.head（u.request_uri）.code} 除非状态_代码==“200” File.open（'error_log.txt'，'a+'）{| File | File.puts“#{link.href}是#{status_code}” 结束结束

如果您有大量的链接，那么以前的所有解决方案都是低效的，因为对于每一个解决方案，它都会与承载链接的服务器建立一个新的HTTP连接

我编写了一个单行bash命令，它将使用curl命令获取stdin提供的链接列表，并返回对应于每个链接的状态代码列表。这里的关键点是curl在同一调用中获取所有链接，它将重用HTTP连接，这将显著提高速度

但是，curl会将列表分成256个块，这仍然远远超过1！为了确保连接被重用，首先对链接进行排序（只需使用sort命令）

cat | xargs curl--head--location-w'--HTTP|u STATUS|u code:%{HTTP|u code}\n\n'-s--retry 10--globoff | grep HTTP|u STATUS| u code | cut d:-f2>

值得注意的是，上面的命令将遵循HTTP重定向，对临时错误（超时或5xx）重试10次，当然只获取头

更新：添加了--globoff，以便curl在包含{}或[]时不会扩展任何url

其中一件事是，我不一定需要状态代码..我只需要验证这些动态生成的链接是否转到真实的端点，或者端点没有出错等等。我认为状态代码很容易检查，除非你知道在每个链接的端点会发生什么，并且想编写特定的基于watir的测试来查找页面上的特定内容，我不得不说出于你的目的，只需查看结果代码就可以进行简单的链接检查。感谢您的详细回复！我并不是在尝试 require 'net/http' require 'uri' mylinks = Browser.ul(:id, 'my_ul_id').links mylinks.each do |link| u = URI.parse link.href status_code = Net::HTTP.start(u.host,u.port){|http| http.head(u.request_uri).code } unless status_code == '200' File.open('error_log.txt','a+'){|file| file.puts "#{link.href} is #{status_code}" } end end

cat <YOUR_LINKS_FILE_ONE_PER_LINE> | xargs curl --head --location -w '---HTTP_STATUS_CODE:%{http_code}\n\n' -s --retry 10 --globoff | grep HTTP_STATUS_CODE | cut -d: -f2 > <RESULTS_FILE>