Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/25.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby 从Mechanize/Nokogiri获取链接_Ruby_Nokogiri_Mechanize - Fatal编程技术网

Ruby 从Mechanize/Nokogiri获取链接

Ruby 从Mechanize/Nokogiri获取链接,ruby,nokogiri,mechanize,Ruby,Nokogiri,Mechanize,我正试图找到从Nokogiri节点检索a href链接的最佳方法。这就是我所处的位置 mech = Mechanize.new mech.get(HOME_URL) mech.page.search('.listing_content').each do |business| website = business.css('.website-feature') puts website.class puts website.inner_html end 输出=> Nokog

我正试图找到从Nokogiri节点检索a href链接的最佳方法。这就是我所处的位置

mech = Mechanize.new 
mech.get(HOME_URL) 

mech.page.search('.listing_content').each do |business| 
  website = business.css('.website-feature')
  puts website.class
  puts website.inner_html
end
输出=>

Nokogiri::XML::NodeSet <a href="http://urlofsite.com" class="track-visit-website no-tracks" onclick='omniture.callClick({"eVar6":6,"eVar9":1,"eVar21":"search_results","eVar50":null,"prop17":"cars","prop26":"64c15af0-a558-012f-a041-00215a4685f6","eVar42":"64c15af0-a558-012f-a041-00215a4685f6","prop27":6,"prop38":"search_results","prop39":1,"prop46":null,"events":"event6,event7","eVar51":optimostIDs.trialID.toString(),"eVar52":optimostIDs.segmentID.toString(),"eVar53":optimostIDs.creativeID.toString(),"eVar54":optimostIDs.subjectID.toString(),"prop47":null,"prop51":optimostIDs.trialID.toString(),"prop52":optimostIDs.segmentID.toString(),"prop53":optimostIDs.creativeID.toString(),"prop54":optimostIDs.subjectID.toString(),"prop56":"Saint+George%2C+UT","prop57":null,"prop58":false,"prop59":null,"eVar60":"relevancyTest2","prop60":"relevancyTest2","prop61":false,"prop62":null,"prop64":null,"prop67":null,"prop68":null,"prop70":null,"prop71":null});; atti_logs.attiClick({"iid":"651691e0-a558-012f-2ca7-18a9053c171a","lt":6,"ptid":"www.yellowpages.com","rid":"vendetta-236e7298-3a4f-4744-8ff5-4eb5fcc8e188","ypid":3848879,"lid":3848879,"vrid":"64c15af0-a558-012f-a041-00215a4685f6","nav":null});' rel="nofollow" target="_blank" title="Executive Service Ctr Website"><span class="raquo">»</span> Website</a> Nokogiri::XML::节点集
基本上,我只需要得到
http://urlofsite.com
内部html
中取出,我不知道该怎么做。我已经读过关于使用CSS和XPATH的文章,但目前我无法使用这两种方法。感谢您的帮助首先,告诉Nokogiri获取节点,而不是节点集
at_css
将检索节点,
css
检索节点集,该节点集类似于数组

而不是:

website = business.css('.website-feature')
尝试:

使用
class=“网站功能”
检索
节点的第一个实例。如果它不是您想要的第一个实例,那么您需要通过抓取节点集然后索引到它来缩小它的范围。没有周围的HTML,很难提供更多帮助

要从节点获取
href
参数,只需将节点视为散列:

website['href']
应返回:

http://urlofsite.com
以下是IRB的一个小样本:

irb(main):001:0> require 'nokogiri'
=> true
irb(main):002:0> 
irb(main):003:0*   html = '<a class="this_node" href="http://example.com">'
=> "<a class=\"this_node\" href=\"http://example.com\">"
irb(main):004:0> doc = Nokogiri::HTML.parse(html)
=> #<Nokogiri::HTML::Document:0x8041e2ec name="document" children=[#<Nokogiri::XML::DTD:0x8041d20c name="html">, #<Nokogiri::XML::Element:0x805a2a14 name="html" children=[#<Nokogiri::XML::Element:0x805df8b0 name="body" children=[#<Nokogiri::XML::Element:0x8084c5d0 name="a" attributes=[#<Nokogiri::XML::Attr:0x80860170 name="class" value="this_node">, #<Nokogiri::XML::Attr:0x8086047c name="href" value="http://example.com">]>]>]>]>
irb(main):005:0> 
irb(main):006:0*   doc.at_css('a.this_node')['href']
=> "http://example.com"
irb(main):007:0> 
irb(main):001:0>要求“nokogiri”
=>正确
irb(主要):002:0>
irb(主):003:0*html=''
=> ""
irb(main):004:0>doc=Nokogiri::HTML.parse(HTML)
=> #
irb(主要):005:0>
irb(main):006:0*doc.at_css('a.this_node')['href']
=> "http://example.com"
irb(主要):007:0>

谢谢您提供的信息。每当我试图用at_css('a.track-visit-website-no-tracks')抓取节点时,它都会返回一个类。我要编辑我的帖子,继续看,再看一遍,我能完全按照你的描述得到它。谢谢你的帮助,铁皮人毕竟有一颗心;)我很高兴它成功了。Nokogiri是一个很棒的XML/HTML解析器,所以感谢这个团队。
irb(main):001:0> require 'nokogiri'
=> true
irb(main):002:0> 
irb(main):003:0*   html = '<a class="this_node" href="http://example.com">'
=> "<a class=\"this_node\" href=\"http://example.com\">"
irb(main):004:0> doc = Nokogiri::HTML.parse(html)
=> #<Nokogiri::HTML::Document:0x8041e2ec name="document" children=[#<Nokogiri::XML::DTD:0x8041d20c name="html">, #<Nokogiri::XML::Element:0x805a2a14 name="html" children=[#<Nokogiri::XML::Element:0x805df8b0 name="body" children=[#<Nokogiri::XML::Element:0x8084c5d0 name="a" attributes=[#<Nokogiri::XML::Attr:0x80860170 name="class" value="this_node">, #<Nokogiri::XML::Attr:0x8086047c name="href" value="http://example.com">]>]>]>]>
irb(main):005:0> 
irb(main):006:0*   doc.at_css('a.this_node')['href']
=> "http://example.com"
irb(main):007:0>