Ruby on rails 将rails任务转换为rake
我的模型/文件夹中当前有此文件:Ruby on rails 将rails任务转换为rake,ruby-on-rails,ruby,Ruby On Rails,Ruby,我的模型/文件夹中当前有此文件: class Show < ActiveRecord::Base require 'nokogiri' require 'open-uri' has_many :user_shows has_many :users, through: :user_shows def self.update_all_screenings Show.all.each do |show| show.update_attribute(
class Show < ActiveRecord::Base
require 'nokogiri'
require 'open-uri'
has_many :user_shows
has_many :users, through: :user_shows
def self.update_all_screenings
Show.all.each do |show|
show.update_attribute(:next_screening, Show.update_next_screening(show.url))
end
end
def self.update_next_screening(url)
nextep = Nokogiri::HTML(open(url))
## Finds the title of the show and extracts the date of the show and converts to string ##
begin
title = nextep.at_css('h1').text
date = nextep.at_css('.next_episode .highlight_date').text[/\d{1,2}\/\d{1,2}\/\d{4}/]
date = date.to_s
## Because if it airs today it won't have a date rather a time this checks whether or not
## there is a date. If there is it will remain, if not it will insert todays date
## plus get the time that the show is airing
if date =~ /\d{1,2}\/\d{1,2}\/\d{4}/
showtime = DateTime.strptime(date, "%m/%d/%Y")
else
date = DateTime.now.strftime("%D")
time = nextep.at_css('.next_episode .highlight_date').text[/\dPM|\dAM/]
time = time.to_s
showtime = date + " " + time
showtime = DateTime.strptime(showtime, "%m/%d/%y %l%p")
end
return showtime
rescue
return nil
end
end
end
这需要很长时间。我目前有一个非常类似的脚本,它是一个rake文件,必须执行两倍于刮取量的操作,并设法在大约10分钟内完成,而这一个操作将需要8个小时。所以我想知道如何将这个文件转换成rake任务?我正在构建的整个应用程序都取决于它能否在最多1小时内完成
下面是另一个脚本供参考:
require 'mechanize'
namespace :show do
desc "add tv shows from web into database"
task :scrape => :environment do
puts 'scraping...'
Show.delete_all
agent = Mechanize.new
agent.get 'http://www.tv.com/shows/sort/a_z/'
agent.page.search('//div[@class="alphabet"]//li[not(contains(@class, "selected"))]/a').each do |letter_link|
agent.get letter_link[:href]
letter = letter_link.text.upcase
agent.page.search('//li[@class="show"]/a').map do |show_link|
Show.create(title: show_link.text, url:'http://tv.com' + show_link[:href].to_s + 'episodes/')
end
while next_page_link = agent.page.at('//div[@class="_pagination"]//a[@class="next"]') do
agent.get next_page_link[:href]
agent.page.search('//li[@class="show"]/a').map do |show_link|
Show.create(title: show_link.text, url:'http://tv.com' + show_link[:href].to_s + 'episodes/')
end
end
end
end
end
Rake不是什么灵丹妙药——它不会让你的代码运行得更快 您可以做的是更高效地运行代码。代码中的主要时间使用者是迭代调用
open(url)
。如果您可以同时读取所有URL,那么整个过程所需的时间应该是现在所需时间的一小部分
您可以使用gem(或其他gem)为您处理此问题
——危险!未经测试的代码--强>
我没有使用此gem的经验,但您的代码可能如下所示:
require 'nokogiri'
require 'open-uri'
require 'typhoeus'
class Show < ActiveRecord::Base
has_many :user_shows
has_many :users, through: :user_shows
def self.update_all_screenings
hydra = Typhoeus::Hydra.hydra
Show.all.each do |show|
request = Typhoeus::Request.new(show.url, followlocation: true)
request.on_complete do |response|
show.update_attribute(:next_screening, Show.update_next_screening(response.body))
end
hydra.queue(request)
end
hydra.run
end
def self.update_next_screening(body)
nextep = Nokogiri::HTML(body)
## Finds the title of the show and extracts the date of the show and converts to string ##
begin
title = nextep.at_css('h1').text
date = nextep.at_css('.next_episode .highlight_date').text[/\d{1,2}\/\d{1,2}\/\d{4}/]
date = date.to_s
## Because if it airs today it won't have a date rather a time this checks whether or not
## there is a date. If there is it will remain, if not it will insert todays date
## plus get the time that the show is airing
if date =~ /\d{1,2}\/\d{1,2}\/\d{4}/
showtime = DateTime.strptime(date, "%m/%d/%Y")
else
date = DateTime.now.strftime("%D")
time = nextep.at_css('.next_episode .highlight_date').text[/\dPM|\dAM/]
time = time.to_s
showtime = date + " " + time
showtime = DateTime.strptime(showtime, "%m/%d/%y %l%p")
end
return showtime
rescue
return nil
end
end
end
需要“nokogiri”
需要“打开uri”
需要“伤寒症”
类Show
上面的操作应该收集一个队列中的所有请求,并同时运行它们,对任何响应执行操作。尝试延迟的\u作业。它很容易实现,而且你的方法将在后台运行。嘿@SachinPrasad这会加快它的速度吗?还是在作业完成时只会延迟?它只会在后台运行该方法。延迟的作业将使该方法异步,因此你不必等待该方法完成。@SachinPrasad ah ok。我打算在安装时使用它。然而,我需要这个脚本每天在1小时内运行,完成延迟的任务能确保这一点吗?@HarryLucas我不知道仅仅将它转换为rake任务是否是加速它的方法。您可以在不同的地方输入时间输出,以确定哪个操作需要这么长的时间。感谢您为此投入的时间!我试了一下,但似乎每一行都在返回:
performed EASY url=response\u code=301
@HarryLucas-tryrequest=typhous::request.new(show.url,followlocation:true)
(我已经更新了帖子-应该遵循重定向)
require 'nokogiri'
require 'open-uri'
require 'typhoeus'
class Show < ActiveRecord::Base
has_many :user_shows
has_many :users, through: :user_shows
def self.update_all_screenings
hydra = Typhoeus::Hydra.hydra
Show.all.each do |show|
request = Typhoeus::Request.new(show.url, followlocation: true)
request.on_complete do |response|
show.update_attribute(:next_screening, Show.update_next_screening(response.body))
end
hydra.queue(request)
end
hydra.run
end
def self.update_next_screening(body)
nextep = Nokogiri::HTML(body)
## Finds the title of the show and extracts the date of the show and converts to string ##
begin
title = nextep.at_css('h1').text
date = nextep.at_css('.next_episode .highlight_date').text[/\d{1,2}\/\d{1,2}\/\d{4}/]
date = date.to_s
## Because if it airs today it won't have a date rather a time this checks whether or not
## there is a date. If there is it will remain, if not it will insert todays date
## plus get the time that the show is airing
if date =~ /\d{1,2}\/\d{1,2}\/\d{4}/
showtime = DateTime.strptime(date, "%m/%d/%Y")
else
date = DateTime.now.strftime("%D")
time = nextep.at_css('.next_episode .highlight_date').text[/\dPM|\dAM/]
time = time.to_s
showtime = date + " " + time
showtime = DateTime.strptime(showtime, "%m/%d/%y %l%p")
end
return showtime
rescue
return nil
end
end
end