Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/23.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby 如何进一步处理Nokogiri::XML::元素?_Ruby_Nokogiri - Fatal编程技术网

Ruby 如何进一步处理Nokogiri::XML::元素?

Ruby 如何进一步处理Nokogiri::XML::元素?,ruby,nokogiri,Ruby,Nokogiri,我用Ruby编写了一个简短的脚本,使用Nokogiri从网页中提取一些数据。该脚本工作正常,但当前它作为单个Nokogiri::XML::元素返回多个嵌套标记 脚本如下: require 'rubygems' require 'nokogiri' #some dummy content that mimics the structure of the web page dummy_content = '<div id="div_saadi"><div><div s

我用Ruby编写了一个简短的脚本,使用Nokogiri从网页中提取一些数据。该脚本工作正常,但当前它作为单个Nokogiri::XML::元素返回多个嵌套标记

脚本如下:

require 'rubygems'
require 'nokogiri'

#some dummy content that mimics the structure of the web page
dummy_content = '<div id="div_saadi"><div><div style="padding:10px 0"><span class="t4">content</span>content outside of the span<span class="t2">morecontent</span>morecontent outside of the span</div></div></div>'
page = Nokogiri::HTML(dummy_content)

#grab the second div inside of the div entitled div_saadi
result = page.css('div#div_saadi div')[1]

puts result
puts result.class
<div style="padding:10px 0">
<span class="t4">content</span>content outside of the span<span class="t2">morecontent</span>morecontent outside of the span
</div>
Nokogiri::XML::Element

你离得很近,但不明白你得到了什么

根据HTML标记,您可以获得嵌入的标记。这就是发生的情况:您请求的是单个节点,但它包含其他节点:

puts page.css('div#div_saadi div')[1].to_html
# >> <div style="padding:10px 0">
# >> <span class="t4">content</span>content outside of the span<span class="t2">morecontent</span>morecontent outside of the span</div>
相反,您必须迭代各个嵌入节点并提取它们的文本:

require 'nokogiri'

dummy_content = '<div id="div_saadi"><div><div style="padding:10px 0"><span class="t4">content</span>content outside of the span<span class="t2">morecontent</span>morecontent outside of the span</div></div></div>'
page = Nokogiri::HTML(dummy_content)

result = page.css('div#div_saadi div')[1]
puts result.children.map(&:text)

# >> content
# >> content outside of the span
# >> morecontent
# >> morecontent outside of the span
需要“nokogiri”
dummy_content='contentcontentcontent不在span范围内moreContentMoreContent不在span范围内'
page=Nokogiri::HTML(虚拟内容)
结果=page.css('div#div#u saadi div')[1]
放置result.children.map(&:text)
#>>内容
#>>范围之外的内容
#>>更多内容
#>>范围之外的更多内容

将所有嵌入节点作为节点集返回。迭代返回节点,在特定节点上使用
text
将返回您想要的内容。

您需要在问题中提供一个更全面的HTML示例。不要将我们指向链接,因为链接会腐烂和消亡。
result
将是一个类似于节点数组的节点集。Nokogiri可以返回节点集或单个节点的文本/内容。也许你应该搜索一下如何做到这一点?对不起,但这两条评论对我来说都没有意义。首先,我给了你们一个HTML的例子。只不过是内容而已。Seconly,它不返回节点集。它返回Nokogiri::XML::Element——这正是我问这个问题的原因。
require'rubygems'
自Ruby1.9以来就不再需要了。
result = page.css('div#div_saadi div')[1].text
# => "contentcontent outside of the spanmorecontentmorecontent outside of the span"
require 'nokogiri'

dummy_content = '<div id="div_saadi"><div><div style="padding:10px 0"><span class="t4">content</span>content outside of the span<span class="t2">morecontent</span>morecontent outside of the span</div></div></div>'
page = Nokogiri::HTML(dummy_content)

result = page.css('div#div_saadi div')[1]
puts result.children.map(&:text)

# >> content
# >> content outside of the span
# >> morecontent
# >> morecontent outside of the span