Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/24.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby on rails 如何使用Nokogiri导航DOM_Ruby On Rails_Ruby_Dom_Xpath_Nokogiri - Fatal编程技术网

Ruby on rails 如何使用Nokogiri导航DOM

Ruby on rails 如何使用Nokogiri导航DOM,ruby-on-rails,ruby,dom,xpath,nokogiri,Ruby On Rails,Ruby,Dom,Xpath,Nokogiri,我试图填充变量parent\u element\u h1和parent\u element\u h2。有人能帮我把我需要的信息输入到这些变量中吗 require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) "<html> <body> <p id='para-1'>A</p> <div clas

我试图填充变量
parent\u element\u h1
parent\u element\u h2
。有人能帮我把我需要的信息输入到这些变量中吗

require 'rubygems'
require 'nokogiri'

value = Nokogiri::HTML.parse(<<-HTML_END)
  "<html>
    <body>
      <p id='para-1'>A</p>
      <div class='block' id='X1'>
        <h1>Foo</h1>
        <p id='para-2'>B</p>
      </div>
      <p id='para-3'>C</p>
      <h2>Bar</h2>
      <p id='para-4'>D</p>
      <p id='para-5'>E</p>
      <div class='block' id='X2'>
        <p id='para-6'>F</p>
      </div>
    </body>
  </html>"
HTML_END

parent = value.css('body').first

# start_here is given: A Nokogiri::XML::Element of the <div> with the id 'X2
start_here = parent.at('div.block#X2')

# this should be a Nokogiri::XML::Element of the nearest, previous h1.
# in this example it's the one with the value 'Foo'
parent_element_h1 = 

# this should be a Nokogiri::XML::Element of the nearest, previous h2. 
# in this example it's the one with the value 'Bar'
parent_element_h2 =

在接受了一个答案后,我想出了一个答案。它很有魅力,我觉得它很酷。

如果我理解您的问题,我会采取的方法是使用XPath或CSS来搜索“start_here”元素和要在其下搜索的父元素。然后,递归地遍历树,从父级开始,在点击“start_here”元素时停止,并一直保持最后一个与您的样式匹配的元素

比如:

parent = value.search("//body").first
div = value.search("//div[@id = 'X2']").first

find = FindPriorTo.new(div)

assert_equal('Foo', find.find_from(parent, 'h1').text)
assert_equal('Bar', find.find_from(parent, 'h2').text) 
其中
FindPriorTo
是一个处理递归的简单类:

class FindPriorTo
  def initialize(stop_element)
    @stop_element = stop_element
  end

  def find_from(parent, style)
    @should_stop = nil
    @last_style  = nil

    recursive_search(parent, style)
  end

  def recursive_search(parent, style)
    parent.children.each do |ch|
      recursive_search(ch, style)
      return @last_style if @should_stop

      @should_stop = (ch == @stop_element)
      @last_style = ch if ch.name == style
    end

    @last_style    
  end

end
如果这种方法的可扩展性不够,那么您可能可以通过重写
递归搜索来优化内容,从而不使用递归,还可以传入您正在查找的两种样式并跟踪上次找到的样式,这样您就不必额外遍历树


我还想说,在解析文档时尝试使用monkey patching节点,但看起来所有这些都是用C编写的。也许您最好使用Nokogiri以外的具有原生Ruby SAX解析器的东西(也许),或者如果您真正关心的是速度,请使用Xerces或类似工具在C/C++中执行搜索部分。但我不知道它们在解析HTML方面有多好。

如果您不知道元素之间的关系,可以通过这种方式(在文档中的任何位置)搜索它们:

但是,如果您需要提交表单,则应使用mechanize:


# create mech object
mech = WWW::Mechanize.new
# load site
mech.get("address")
# select a form, in this case, I select the first form. You can select the one you need 
# from the array
form = mech.page.forms.first
# you fill the fields like this: form.name_of_the_field
form.element_name  = value
form.other_element = other_value

您可以使用CSS选择器搜索Nokogiri
HTML::Element
的后代。您可以使用
.parent
方法遍历祖先

parent_element_h1 = value.css("h1").first.parent
parent_element_h2 = value.css("h2").first.parent

也许这样就行了。我不确定性能如何,是否有一些案例我没有想到

def find(root, start, tag)
    ps, res = start, nil
    until res or (ps == root)
        ps  = ps.previous || ps.parent
        res = ps.css(tag).last
        res ||= ps.name == tag ? ps : nil
    end
    res || "Not found!"
end

parent_element_h1 =  find(parent, start_here, 'h1')
这是我自己的解决方案(感谢我的同事在这方面的帮助!),使用递归方法解析所有元素,而不管是兄弟姐妹还是其他兄弟姐妹的孩子

require 'rubygems'
require 'nokogiri'

value = Nokogiri::HTML.parse(<<-HTML_END)
  "<html>
    <body>
      <p id='para-1'>A</p>
      <div class='block' id='X1'>
        <h1>Foo</h1>
        <p id='para-2'>B</p>
      </div>
      <p id='para-3'>C</p>
      <h2>Bar</h2>
      <p id='para-4'>D</p>
      <p id='para-5'>E</p>
      <div class='block' id='X2'>
        <p id='para-6'>F</p>
      </div>
    </body>
  </html>"
HTML_END

parent = value.css('body').first

# start_here is given: A Nokogiri::XML::Element of the <div> with the id 'X2
@start_here = parent.at('div.block#X2')

# Search for parent elements of kind "_style" starting from _start_element
def search_for_parent_element(_start_element, _style)
  unless _start_element.nil?
    # have we already found what we're looking for?
    if _start_element.name == _style
      return _start_element
    end
    # _start_element is a div.block and not the _start_element itself
    if _start_element[:class] == "block" && _start_element[:id] != @start_here[:id]
      # begin recursion with last child inside div.block
      from_child = search_for_parent_element(_start_element.children.last, _style)
      if(from_child)
        return from_child
      end
    end
    # begin recursion with previous element
    from_child = search_for_parent_element(_start_element.previous, _style) 
    return from_child ? from_child : false
  else
    return false
  end
end

# this should be a Nokogiri::XML::Element of the nearest, previous h1.
# in this example it's the one with the value 'Foo'
puts parent_element_h1 = search_for_parent_element(@start_here,"h1")

# this should be a Nokogiri::XML::Element of the nearest, previous h2. 
# in this example it's the one with the value 'Bar'
puts parent_element_h2 = search_for_parent_element(@start_here,"h2")
需要“rubygems”
需要“nokogiri”

value=Nokogiri::HTML.parse(我想我遇到这个问题已经晚了几年,但我觉得不得不发布,因为所有其他解决方案都太复杂了

这是一条带有XPath的语句:

start = doc.at('div.block#X2')

start.at_xpath('(preceding-sibling::h1 | preceding-sibling::*//h1)[last()]')
#=> <h2>Foo</h2>    

start.at_xpath('(preceding-sibling::h2 | preceding-sibling::*//h2)[last()]')
#=> <h2>Bar</h2>
start=doc.at('div.block#X2')
start.at_xpath('(前面的同级::h1 |前面的同级::*//h1)[last()]')
#=>Foo
start.at_xpath('(前面的同级::h2 |前面的同级::*//h2)[last()]')
#=>巴

这可以容纳直接的前一个兄弟姐妹或前一个兄弟姐妹的子代
谓词确保您获得最接近的上一个匹配项。

问题是,我不知道标题是同级还是同级的子项。您的解决方案假定我知道它是同级还是同级的子项。此外,我的示例数据比我的真实数据短得多:“我的标签”可以位于文档中的任何位置。您可以n如果您不确定兄弟/子关系,请在XPath中使用“//”而不是“/html/body/”甚至“/html/body//div”。我认为我的问题不够具体,我已经编辑了这个问题,希望现在可以清楚地知道我在找什么(检查我试图用数据填充的变量上面的注释)。感谢您的提交。这有点像黑客,但在本例中它是有效的。虽然如果这里的start_位于另一个div块内,它将不起作用。我要寻找的是一种获取最近的前一个标题的方法,而忽略它在文档中的层次结构。是的,修剪有点像黑客。请查看我编辑的答案是否是您要查找的。这s不能解决我的问题,但我已对我的问题进行了更具体的编辑。请注意我试图填写的两个变量上面的注释。简言之:这不起作用,因为它将匹配更多最近的上一个h1或h2。这不会返回我要查找的结果。请再次阅读问题。
require 'rubygems'
require 'nokogiri'

value = Nokogiri::HTML.parse(<<-HTML_END)
  "<html>
    <body>
      <p id='para-1'>A</p>
      <div class='block' id='X1'>
        <h1>Foo</h1>
        <p id='para-2'>B</p>
      </div>
      <p id='para-3'>C</p>
      <h2>Bar</h2>
      <p id='para-4'>D</p>
      <p id='para-5'>E</p>
      <div class='block' id='X2'>
        <p id='para-6'>F</p>
      </div>
    </body>
  </html>"
HTML_END

parent = value.css('body').first

# start_here is given: A Nokogiri::XML::Element of the <div> with the id 'X2
@start_here = parent.at('div.block#X2')

# Search for parent elements of kind "_style" starting from _start_element
def search_for_parent_element(_start_element, _style)
  unless _start_element.nil?
    # have we already found what we're looking for?
    if _start_element.name == _style
      return _start_element
    end
    # _start_element is a div.block and not the _start_element itself
    if _start_element[:class] == "block" && _start_element[:id] != @start_here[:id]
      # begin recursion with last child inside div.block
      from_child = search_for_parent_element(_start_element.children.last, _style)
      if(from_child)
        return from_child
      end
    end
    # begin recursion with previous element
    from_child = search_for_parent_element(_start_element.previous, _style) 
    return from_child ? from_child : false
  else
    return false
  end
end

# this should be a Nokogiri::XML::Element of the nearest, previous h1.
# in this example it's the one with the value 'Foo'
puts parent_element_h1 = search_for_parent_element(@start_here,"h1")

# this should be a Nokogiri::XML::Element of the nearest, previous h2. 
# in this example it's the one with the value 'Bar'
puts parent_element_h2 = search_for_parent_element(@start_here,"h2")
start = doc.at('div.block#X2')

start.at_xpath('(preceding-sibling::h1 | preceding-sibling::*//h1)[last()]')
#=> <h2>Foo</h2>    

start.at_xpath('(preceding-sibling::h2 | preceding-sibling::*//h2)[last()]')
#=> <h2>Bar</h2>