Ruby Mechanize将``替换为`‾;`阅读链接时';从网页中选择s href
我正在使用Ruby Mechanize将``替换为`‾;`阅读链接时';从网页中选择s href,ruby,web-scraping,screen-scraping,mechanize,Ruby,Web Scraping,Screen Scraping,Mechanize,我正在使用MechanizeRuby gem来抓取epinions.com的一些内容。但不知何故,有些链接没有被正确解读。这是由机械化将~替换为-引起的。结果是Mechanize无法单击链接 不成功,然后成功刮取的示例: # script agent = Mechanize.new page_1 = agent.get("http://www.epinions.com/webs-Web_Services-All-Merchants-AtomicPark_com/display_~review
Mechanize
Ruby gem来抓取epinions.com的一些内容。但不知何故,有些链接没有被正确解读。这是由机械化将~
替换为-
引起的。结果是Mechanize无法单击链接
不成功,然后成功刮取的示例:
# script
agent = Mechanize.new
page_1 = agent.get("http://www.epinions.com/webs-Web_Services-All-Merchants-AtomicPark_com/display_~reviews")
puts page_1.links_with(:href => /full_specs/, :text => /^View Information$/).last.inspect
page_2 = agent.get("http://www.epinions.com/webs-Web_Services-All-Merchants-Vanns_com/display_~reviews")
puts page_2.links_with(:href => /full_specs/, :text => /^View Information$/).last.inspect
# result
#<Mechanize::Page::Link
"View Information"
"/webs-Web_Services-All-Merchants-AtomicPark_com/display_‾full_specs">
#<Mechanize::Page::Link
"View Information"
"/webs-Web_Services-All-Merchants-Vanns_com/display_~full_specs">
#脚本
agent=Mechanize.new
第1页=代理获取(“http://www.epinions.com/webs-Web_Services-All-Merchants-AtomicPark_com/display_~评论“)
放置带有(:href=>/full\u specs/,:text=>/^View Information$/)的页面1.链接。last.inspect
第2页=代理获取(“http://www.epinions.com/webs-Web_Services-All-Merchants-Vanns_com/display_~评论“)
放置带有(:href=>/full\u specs/,:text=>/^View Information$/)的页面2.链接。last.inspect
#结果
#
#
知道为什么会这样吗?这对我来说很好:
[14:29] arkham ~/Desktop [2.1.0]
↳ $ ruby mechanize.rb
#<Mechanize::Page::Link
"View Information"
"/webs-Web_Services-All-Merchants-AtomicPark_com/display_~full_specs">
#<Mechanize::Page::Link
"View Information"
"/webs-Web_Services-All-Merchants-Vanns_com/display_~full_specs">
雅克罕姆~/Desktop[2.1.0]
↳ $ ruby mechanize.rb
#
#
您使用的是哪个版本的ruby?
ruby 2.0.0p247(2013-06-27修订版41674)[x86\u 64-darwin12.4.0]
和mechanize(2.7.1)
我尝试将mechanize更新为mechanize(2.7.3)
,但运气不佳。