为什么使用ruby查找具有所需文本的节点比使用xpath更快？_Ruby_Xpath_Nokogiri_Benchmarking

为什么使用ruby查找具有所需文本的节点比使用xpath更快？

ruby xpath

为什么使用ruby查找具有所需文本的节点比使用xpath更快？,ruby,xpath,nokogiri,benchmarking,Ruby,Xpath,Nokogiri,Benchmarking,最近我不得不检查html节点是否包含所需的文本。我感到惊讶的是，当我重构代码以使用xpath选择器时，速度慢了10倍。有原始代码的简化版本和基准测试 # has_keyword_benchmark.rb require 'benchmark' require 'nokogiri' Doc = Nokogiri(" <div> <div> A </div> <p> <b>A</b> </

最近我不得不检查html节点是否包含所需的文本。我感到惊讶的是，当我重构代码以使用xpath选择器时，速度慢了10倍。有原始代码的简化版本和基准测试

# has_keyword_benchmark.rb
require 'benchmark'
require 'nokogiri'

Doc = Nokogiri("
<div>
  <div>
    A
  </div>
  <p>
    <b>A</b>
  </p>
  <span>
    B
  </span>
</div>")

def has_keywords_with_xpath
  Doc.xpath('./*[contains(., "A")]').size > 0
end

def has_keywords_with_ruby
  Doc.text.include? 'A'
end

iterations = 10_000
Benchmark.bm(27) do |bm|
  bm.report('checking if has keywords with xpath') do
    iterations.times do
      has_keywords_with_xpath
    end
  end

  bm.report('checking if has keywords with ruby') do
    iterations.times do
      has_keywords_with_ruby
    end
  end
end

使用xpath直观地检查节点是否有一些文本应该更快，但事实并非如此。有人知道为什么吗？

通常，解析和编译XPath表达式要比实际执行它花费更长的时间，即使是在相当大的文档上。例如，使用Saxon，对1Mb源文档运行表达式

count（/*[contains（，'e'）]）

，编译路径表达式需要200ms，而执行它大约需要18ms

如果您的XPathAPI允许您编译一次XPath表达式，然后重复执行它（或者如果它在后台缓存编译后的表达式），那么绝对值得利用该功能

实际的XPath执行速度可能至少与手工编写的导航代码一样快，可能更快。造成开销的是准备工作。

考虑到这样一个事实，

xpath

将允许您检索该关键字周围的上下文，其中

String#include

只会告诉您它在某处或不在某处。

                                  user     system      total        real
checking if has keywords with xpath  0.400000   0.020000   0.420000 (  0.428484)
checking if has keywords with ruby  0.020000   0.000000   0.020000 (  0.023773)