File.read的Ruby性能

File.read的Ruby性能,ruby,string,performance,io,Ruby,String,Performance,Io,给定以下脚本: require 'rubygems' require 'open-uri' require 'benchmark' response = open('http://gdata.youtube.com/feeds/api/videos?q=skateboarding+dog') outside = Benchmark.measure do response_body = response.read 10000.times do response_body.sca

给定以下脚本:

require 'rubygems'
require 'open-uri'
require 'benchmark'

response = open('http://gdata.youtube.com/feeds/api/videos?q=skateboarding+dog')

outside = Benchmark.measure do
  response_body = response.read
  10000.times do
    response_body.scan(/dog/)
  end
end

inside = Benchmark.measure do
  10000.times do
    response.read.scan(/dog/)
  end
end

puts [outside, inside].map(&:utime).inspect
我得到以下结果:

[1.25, 0.06000000000000005]
为什么每次读取文件的性能会提高20倍

如果我的系统信息很重要:

ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-darwin12.4.0]

这是因为在第一次测试之后,
response
被读取到最后,而在第二次测试的每次迭代中,
read
的结果都是微不足道的,这节省了时间,而且它也只返回空字符串。因此,
扫描也很快结束

irb> response.read.scan(/dog/)
=> ["dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog"]
irb> response.read.scan(/dog/)
=> []

这是因为在第一次测试之后,
response
被读取到最后,而在第二次测试的每次迭代中,
read
的结果都是微不足道的,这节省了时间,而且它也只返回空字符串。因此,
扫描也很快结束

irb> response.read.scan(/dog/)
=> ["dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "dog"]
irb> response.read.scan(/dog/)
=> []

open('http://gdata.youtube.com/feeds/api/videos?q=skateboarding+狗)。读取。扫描(/dog/)=>[“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“狗”、“,“dog”、“dog”、“dog”、“dog”、“dog”、“dog”]
@Kaleidoscope关键在于
read
移动响应的读取指针(它充当流)。当您第二次调用
read
时,它返回空字符串。
open('http://gdata.youtube.com/feeds/api/videos?q=skateboarding+dog').read.scan(/dog/)=>[“dog”狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗、狗
@Kaleidoscope关键是
read
移动响应的读取指针(作为流)。当您第二次调用
read
时,它返回空字符串。