Ruby 如何使用Nokogiri在某些标记之后或之前获取文本_Ruby_Nokogiri

Ruby 如何使用Nokogiri在某些标记之后或之前获取文本

ruby

Ruby 如何使用Nokogiri在某些标记之后或之前获取文本,ruby,nokogiri,Ruby,Nokogiri,我有一个HTML文档，类似这样： <root><template>title</template> <h level="3" i="3">Something</h> <template element="1"><title>test</title></template> # one # two # three # four <h level="4" i="5">somethin

我有一个HTML文档，类似这样：

<root><template>title</template>
<h level="3" i="3">Something</h>
<template element="1"><title>test</title></template>
# one
# two
# three
# four
<h level="4" i="5">something1</h>
some random test
<template element="1"><title>test</title></template>
# first
# second
# third
# fourth
<template element="2"><title>testing</title></template>

我想摘录：

# one
# two 
# three
# four
# first
# second
# third
# fourth
</root>

换句话说，我希望在测试之后和之后开始的下一个标记之前的所有文本

我可以使用“//root/text”获取根目录之间的所有文本，但如何获取特定标记前后的所有文本？

我非常确定krusty.ar是正确的，没有内置的方法来实现这一点。如果愿意，您可以逐个删除根标记中的所有标记。这是一个黑客，但它的工作：

doc = Nokogiri::HTML(open(url)) # or Nokogiri::HTML.parse(File.open(file_path))
doc.xpath('//template').remove
doc.xpath('//h').remove
doc

这将给出您所发布的HTML的搜索结果。

这似乎有效：

require 'nokogiri'

xml = '<root>
    <template>title</template>
    <h level="3" i="3">Something</h>
    <template element="1">
        <title>test</title>
    </template>
    # one
    # two
    # three
    # four
    <h level="4" i="5">something1</h>
    some random test
    <template element="1">
        <title>test</title>
    </template>
    # first
    # second
    # third
    # fourth
    <template element="2">
        <title>testing</title>
    </template>
</root>
'

doc = Nokogiri::XML(xml)
text = (doc / 'template[@element="1"]').map{ |n| n.next_sibling.text.strip.gsub(/\n  +/, "\n") }
puts text
# >> # one
# >> # two
# >> # three
# >> # four
# >> # first
# >> # second
# >> # third
# >> # fourth

我不认为它有一个选择器我可能错了，当然，也许你可以使用SAX方法你能详细介绍一下SAX方法吗？SAX是一种一次导航文档一个标记的方法，而不是用选择器引用一个特定的标记，但是经过一些研究，我认为你会面临同样的问题，那就是，您可以使用根文本，但不能使用标记和另一个标记之间的文本，我想您需要从“//root/text”中删除所有其他标记，或者如果您可以更改模板格式，以便在中间文本中使用类似标记的内容。它不是HTML，而是XML。同样，由于XML的关闭节点在示例文本之后，所以样本XML将无法验证。这将在结果中间留下“随机测试”。