Html 获取href<；的链接名称；a>；标签nokogiri_Html_Ruby_Html Parsing_Nokogiri

Html 获取href<；的链接名称；a>；标签nokogiri

html ruby

Html 获取href<；的链接名称；a>；标签nokogiri,html,ruby,html-parsing,nokogiri,Html,Ruby,Html Parsing,Nokogiri,我正在收集一些数据，这些数据的继承人是/h2/a，但a的href应该包含http://www.thedomain.com。所有链接都是这样的： thedomain.com/test等等。现在我只得到文本，而没有得到href链接本身的名称例如： <h2> <a href="http://www.thedomain.com/test">Hey there</a> <a href="http://www.thedomain.com/test1">2nd

我正在收集一些数据，这些数据的继承人是

/h2/a

，但a的href应该包含

http://www.thedomain.com

。所有链接都是这样的：

thedomain.com/test

等等。现在我只得到文本，而没有得到href链接本身的名称

例如：

<h2>
<a href="http://www.thedomain.com/test">Hey there</a>
<a href="http://www.thedomain.com/test1">2nd link</a>
<a href="http://www.thedomain.com/test2">3rd link</a>
</h2>

嘿，二号线，三号线

而我想要

http://www.thedomain.com/test

等等。

只需获取

@href

而不是

文本（）

：

您也可以使用CSS选择器（可能比本例中的xpath更容易使用）。您可以选择


EOT
html_doc=Nokogiri:：html（html）
html_doc.css（'h2 a'）.map{| link | p link['href']}
# => "http://www.thedomain.com/test"
# => "http://www.thedomain.com/test1"
# => "http://www.thedomain.com/test2"

html_doc.xpath('//h2/a[contains(@href, "http://www.thedomain.com")]/text()')

//h2/a[contains(@href, "http://www.thedomain.com")]/@href

html_doc.css('h2 a')

html = <<EOT
<html>
    <h2>
        <a href="http://www.thedomain.com/test">Hey there</a>
        <a href="http://www.thedomain.com/test1">2nd link</a>
        <a href="http://www.thedomain.com/test2">3rd link</a>
    </h2>
</html>
EOT

html_doc = Nokogiri::HTML(html)
html_doc.css('h2 a').map { |link| p link['href'] }
# => "http://www.thedomain.com/test"
# => "http://www.thedomain.com/test1"
# => "http://www.thedomain.com/test2"