Ruby Hpricot：如何在没有其他html子元素的情况下提取内部文本_Ruby_Parsing_Vim_Rspec_Hpricot

Ruby Hpricot：如何在没有其他html子元素的情况下提取内部文本

ruby parsing vim rspec

Ruby Hpricot：如何在没有其他html子元素的情况下提取内部文本,ruby,parsing,vim,rspec,hpricot,Ruby,Parsing,Vim,Rspec,Hpricot,我正在开发vim rspec插件(https://github.com/skwp/vim-rspec)-我正在解析rspec中的一些html。看起来是这样的： doc = %{ <dl> <dt id="example_group_1">This is the heading text</dt> Some puts output here </dl> } 我可以通过使用 (Hpricot.parse(doc)/:dl).first/:d

我正在开发vim rspec插件(https://github.com/skwp/vim-rspec)-我正在解析rspec中的一些html。看起来是这样的：

doc = %{
<dl>
  <dt id="example_group_1">This is the heading text</dt>
  Some puts output here
 </dl>
}

我可以通过使用

(Hpricot.parse(doc)/:dl).first/:dt

但我如何访问“此处的某些输出”区域？如果我使用内部html，那么还有太多其他垃圾要解析。我已经浏览了hpricot文档，但没有找到一种简单的方法来获取html元素的内部文本，而忽略其html子元素

请注意，这是一个糟糕的HTML。如果您可以控制它，则应将所需内容包装在

中

在XML术语中，您要查找的是

元素后面的TextNode。在我的评论中，我展示了如何在Nokogiri中使用XPath选择这个节点

但是，如果必须使用Hpricot，并且无法使用它选择文本节点，则可以通过获取

内部html

然后去除不需要的：

(Hpricot.parse(doc)/:dl).first.inner_html.sub %r{<dt>.+?</dt>}, ''

（Hpricot.parse（doc）/：dl）.first.inner_html.sub%r{.+？}，'

请注意，这是一个糟糕的HTML。如果您可以控制它，则应将所需内容包装在

中

在XML术语中，您要查找的是

元素后面的TextNode。在我的评论中，我展示了如何在Nokogiri中使用XPath选择这个节点

但是，如果必须使用Hpricot，并且无法使用它选择文本节点，则可以通过获取

内部html

然后去除不需要的：

(Hpricot.parse(doc)/:dl).first.inner_html.sub %r{<dt>.+?</dt>}, ''

（Hpricot.parse（doc）/：dl）.first.inner_html.sub%r{.+？}，'

最后，我通过手动解析孩子们，自己找到了一条路线：

(@context/"dl").each do |dl|
  dl.children.each do |child|
    if child.is_a?(Hpricot::Elem) && child.name == 'dd'
      # do stuff with the element
    elsif child.is_a?(Hpricot::Text)
      text=child.to_s.strip
      puts text unless text.empty?
    end
  end

最后，我通过手动解析孩子们，自己找到了一条路线：

(@context/"dl").each do |dl|
  dl.children.each do |child|
    if child.is_a?(Hpricot::Elem) && child.name == 'dd'
      # do stuff with the element
    elsif child.is_a?(Hpricot::Text)
      text=child.to_s.strip
      puts text unless text.empty?
    end
  end

对于Nokogiri，这将是

Nokogiri.XML（doc，&:noblanks）.at_xpath（'/dl/text（）'）.content.strip

我建议您编写插件，以便它可以与Nokogiri和Hpricot一起使用。Nokogiri已经成为Ruby XML/HTML解析的实际标准。使用Nokogiri，这将是

Nokogiri.XML（doc，&:noblanks）.at_xpath（'/dl/text（）'）.content.strip

我建议编写插件，以便它可以与Nokogiri和Hpricot一起工作。Nokogiri已经成为Ruby XML/HTML解析的实际标准。