Ruby Nokogiri将内部文本替换为<;span>;假话
下面是一个示例HTML片段:Ruby Nokogiri将内部文本替换为<;span>;假话,ruby,replace,nokogiri,html,swap,Ruby,Replace,Nokogiri,Html,Swap,下面是一个示例HTML片段: <p class="stanza">Thus grew the tale of Wonderland:<br/> Thus slowly, one by one,<br/> Its quaint events were hammered out -<br/> And now the tale is done,<br/> And home we steer, a merry crew,<
<p class="stanza">Thus grew the tale of Wonderland:<br/>
Thus slowly, one by one,<br/>
Its quaint events were hammered out -<br/>
And now the tale is done,<br/>
And home we steer, a merry crew,<br/>
Beneath the setting sun.<br/></p>
于是诞生了仙境故事:
慢慢地,一个接一个,
它奇特的事件被敲定了-
现在故事结束了,
我们在家掌舵,快乐的船员,
夕阳下。
我需要用环绕每个单词,因此
如下:
<span id='w1'>Anon,</span> <span id='w2'>to</span> <span id='w3'>sudden</span>
<span id='w4'>silence</span> <span id='w5'>won,</span> ....
i = 0
doc.search('//p[@class="stanza"]/text()').each do |n|
spans = n.content.scan(/\S+/).map do |s|
"<span id=\"w#{i += 1}\">" + s + '</span>'
end
n.replace(spans.join(' '))
end
Anon,突然
沉默赢得了。。。。
我写的这篇文章创造了新的片段。如何以旧换新
def callchildren(n)
n.children.each do |n| # call recursively until arrive at a node w/o children
callchildren(n)
end
if n.node_type == 3 && n.to_s.strip.empty? != true
new_node = ""
n.to_s.split.each { |w|
new_node = new_node + "<span id='w#{$word_number}'>#{w}</span> "
$word_number += 1
}
# puts new_node
# HELP? How do I get new_node swapped in?
end
end
def callchildren(n)
n、 children.each do | n |#递归调用,直到到达没有子节点的节点
儿童(n)
结束
如果n.node_type==3&&n.to_.strip.empty真的
新建_节点=“”
n、 对美国的每一个{124; w|
新建_节点=新建_节点+“#{w}”
$word_number+=1
}
#放置新节点
#帮忙?如何换入新的_节点?
结束
结束
我试图为您的问题提供解决方案:
require 'nokogiri'
Inf = 1.0/0.0
def number_words(node, counter = nil)
# define infinite counter (Ruby >= 1.8.7)
counter ||= (1..Inf).each
doc = node.document
unless node.is_a?(Nokogiri::XML::Text)
# recurse for children and collect all the returned
# nodes into an array
children = node.children.inject([]) { |acc, child|
acc += number_words(child, counter)
}
# replace the node's children
node.children = Nokogiri::XML::NodeSet.new(doc, children)
return [node]
end
# for text nodes, we generate a list of span nodes
# and return it (this is more secure than OP's original
# approach that is vulnerable to HTML injection)n
node.to_s.strip.split.inject([]) { |acc, word|
span = Nokogiri::XML::Node.new("span", node)
span.content = word
span["id"] = "w#{counter.next}"
# add a space if we are not at the beginning
acc << Nokogiri::XML::Text.new(" ", doc) unless acc.empty?
# add our new span to the collection
acc << span
}
end
# demo
if __FILE__ == $0
h = <<-HTML
<p class="stanza">Thus grew the tale of Wonderland:<br/>
Thus slowly, one by one,<br/>
Its quaint events were hammered out -<br/>
And now the tale is done,<br/>
And home we steer, a merry crew,<br/>
Beneath the setting sun.<br/></p>
HTML
doc = Nokogiri::HTML.parse(h)
number_words(doc)
p doc.to_xml
end
需要“nokogiri”
Inf=1.0/0.0
def number_字(节点,计数器=nil)
#定义无限计数器(Ruby>=1.8.7)
计数器| |=(1..Inf)。每个
doc=node.document
除非node.is_a?(Nokogiri::XML::Text)
#为子级递归并收集所有返回的
#将节点放入数组中
children=node.children.inject([]){acc,child|
acc+=数字\字(子项、计数器)
}
#替换节点的子节点
node.children=Nokogiri::XML::NodeSet.new(doc,children)
返回[节点]
结束
#对于文本节点,我们生成一个跨度节点列表
#并将其退回(这比OP的原件更安全)
#易受HTML注入攻击的方法)n
node.to_.strip.split.injection([]){acc,word|
span=Nokogiri::XML::Node.new(“span”,Node)
span.content=word
span[“id”]=“w{counter.next}”
#如果我们不是一开始就添加一个空格
acc给定doc
中的Nokogiri::HTML::Document,您可以执行以下操作:
<span id='w1'>Anon,</span> <span id='w2'>to</span> <span id='w3'>sudden</span>
<span id='w4'>silence</span> <span id='w5'>won,</span> ....
i = 0
doc.search('//p[@class="stanza"]/text()').each do |n|
spans = n.content.scan(/\S+/).map do |s|
"<span id=\"w#{i += 1}\">" + s + '</span>'
end
n.replace(spans.join(' '))
end
i=0
doc.search('//p[@class=“stanza”]/text())。每个|
span=n.content.scan(/\S+/).map do|S|
“+s+”
结束
n、 替换(跨距连接(“”))
结束
我是Ruby新手,我必须说,我真的很喜欢无限循环构造和.inject.deletary的使用。我想我的问题不是“完全”接受nokogiri和需要添加节点。非常感谢。而且我更聪明。这也行得通,但对于不同的、更复杂的html布局来说并不是那么普遍。非常感谢你的帮助P