Ruby on rails 在rails中刮取子Web
我的应用程序正在连接一个网页,阅读链接并抓取链接中的网页 主网页-->链接(子网页)--->抓取和获取信息(标题) 出于某种原因,标题是可读的,但我有这个输出,(第89行是scraper|u sw.scrape(uri|u sw)。每个do|subpag |)Ruby on rails 在rails中刮取子Web,ruby-on-rails,ruby,web-scraping,Ruby On Rails,Ruby,Web Scraping,我的应用程序正在连接一个网页,阅读链接并抓取链接中的网页 主网页-->链接(子网页)--->抓取和获取信息(标题) 出于某种原因,标题是可读的,但我有这个输出,(第89行是scraper|u sw.scrape(uri|u sw)。每个do|subpag |) e$stdout.sync=true$stderr.sync=true;加载($0=ARGV.shift)连接\u serv1.rb /Users/sss/web/connect_serv1.rb:93:in`block(2层)in':用
e$stdout.sync=true$stderr.sync=true;加载($0=ARGV.shift)连接\u serv1.rb
/Users/sss/web/connect_serv1.rb:93:in`block(2层)in':用于“来自纽约市的标题1方形图片”的未定义方法`titleweb':String(NoMethodError)
from/Users/sss/web/connect_serv1.rb:89:in'each'
from/Users/sss/web/connect_serv1.rb:89:in'block in'
from/Users/sss/web/connect_serv1.rb:77:在'each'中
from/Users/sss/web/connect_serv1.rb:77:in`'
from/Users/sss/web/connect_serv1.rb:18:in`'
from-e:1:in“load”
from-e:1:in`'
进程已完成,退出代码为1
我真的很感谢你的帮助和时间Scraper::Base.parser:html\u parsersor对不起,我想知道Scraper是你写的东西,还是我可以在某处看一看的珍宝?它不是Ruby标准库的一部分。由于您看到的错误是
未定义的方法“titleweb”
我想看看结果:titleweb
在做什么,但看起来这是Scraper中定义的。
scraper = Scraper.define do
array :items
process "div.mozaique>div", :items => Scraper.define {
process "div.thumb>a", :link => "@href"
result :link
}
result :items
end
scraper_sw = Scraper.define do #this is the subweb
array :subitems
process "div#main", :subitems => Scraper.define {
process "div#main>h1>h2", :titleweb => :text
result :titleweb
}
result :subitems
end
uri = URI.parse(URI.encode(web))
scraper.scrape(uri).each do |pag|
link_subweb = uri + pag.link.to_str
savedata_array = JPG.new(:link_web => link_subweb.to_s,
:source => "server-1"
)
uri_sw = URI.parse(URI.encode(link_subweb.to_s))
scraper_sw.scrape(uri_sw).each do |subpag|
savedata_subweb_array = JPG.new(:title => subpag.titleweb)
end
end
e $stdout.sync=true;$stderr.sync=true;load($0=ARGV.shift) connect_serv1.rb
/Users/sss/web/connect_serv1.rb:93:in `block (2 levels) in <class:JPG>': undefined method `titleweb' for "Title 1 Square picture from NYC":String (NoMethodError)
from /Users/sss/web/connect_serv1.rb:89:in `each'
from /Users/sss/web/connect_serv1.rb:89:in `block in <class:JPG>'
from /Users/sss/web/connect_serv1.rb:77:in `each'
from /Users/sss/web/connect_serv1.rb:77:in `<class:JPG>'
from /Users/sss/web/connect_serv1.rb:18:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
Process finished with exit code 1