Ruby on rails 3 编写Mechanize的缓存版本
我想要一个缓存版本的Mechanize。其思想是#get(uri…)检查该uri之前是否已被获取,如果是,则从缓存中获取响应,而不是访问web。如果不在缓存中,它将访问web并将响应保存在缓存中 我天真的方法行不通。(我可能不需要提及CachedWebPage是ActiveRecord::Base的一个子类):Ruby on rails 3 编写Mechanize的缓存版本,ruby-on-rails-3,activerecord,mechanize,Ruby On Rails 3,Activerecord,Mechanize,我想要一个缓存版本的Mechanize。其思想是#get(uri…)检查该uri之前是否已被获取,如果是,则从缓存中获取响应,而不是访问web。如果不在缓存中,它将访问web并将响应保存在缓存中 我天真的方法行不通。(我可能不需要提及CachedWebPage是ActiveRecord::Base的一个子类): class CachingMechanizeuri,:contents=>contents)} 结束 如果给定块,则返回页面? 页 结束 结束 这会失败,因为Mechanize#get
class CachingMechanizeuri,:contents=>contents)}
结束
如果给定块,则返回页面?
页
结束
结束
这会失败,因为Mechanize#get()返回的对象是一个复杂的循环结构,YAML和JSON都不希望序列化以存储到数据库中
我意识到我想要的是在Mechanize解析低级内容之前捕获它
- 有没有干净的方法可以做到这一点?我想我可以使用Mechanize的post_connect钩子来访问传入的原始页面,但我不知道如何随后将缓存的原始页面传递给Mechanize进行解析
- 是否有一些包我应该使用的网页缓存已经
class CachingMechanize < Mechanize
def get(uri, parameters = [], referer = nil, headers = {})
WebCache.with_web_cache(uri.to_s) { super }
end
end
检查您尝试序列化和反序列化的对象中是否存在lambda或proc。如果您能够(就像我在本例中所做的那样)用对象的方法调用替换lambda,那么您应该能够解决这个问题
希望这对其他人有帮助
更新
针对@Martin关于WebCache定义的请求,以下是:
# Simple model for caching pages fetched from the web. Assumes
# a schema like this:
#
# create_table "web_caches", :force => true do |t|
# t.text "key"
# t.text "value"
# t.datetime "expires_at"
# t.datetime "created_at", :null => false
# t.datetime "updated_at", :null => false
# end
# add_index "web_caches", ["key"], :name => "index_web_caches_on_key", :unique => true
#
class WebCache < ActiveRecord::Base
serialize :value
# WebCache.with_web_cache(key) {
# ...body...
# }
#
# Searches the web_caches table for an entry with a matching key. If
# found, and if the entry has not expired, the value for that entry is
# returned. If not found, or if the entry has expired, yield to the
# body and cache the yielded value before returning it.
#
# Options:
# :expires_at sets the expiration date for this entry upon creation.
# Defaults to one year from now.
# :expired_prior_to overrides the value of 'now' when checking for
# expired entries. Mostly useful for unit testing.
#
def self.with_web_cache(key, opts = {})
serialized_key = YAML.dump(key)
expires_at = opts[:expires_at] || 1.year.from_now
expired_prior_to = opts[:expired_prior_to] || Time.zone.now
if (r = self.where(:key => serialized_key).where("expires_at > ?", expired_prior_to)).exists?
# cache hit
r.first.value
else
# cache miss
yield.tap {|value| self.create!(:key => serialized_key, :value => value, :expires_at => expires_at)}
end
end
# Prune expired entries. Typically called by a cron job.
def self.delete_expired_entries(expired_prior_to = Time.zone.now)
self.where("expires_at < ?", expired_prior_to).destroy_all
end
end
#用于缓存从web获取的页面的简单模型。假设
#类似这样的模式:
#
#创建表格“web缓存”:force=>true do | t|
#文本“键”
#文本“值”
#t.datetime“到期时间”
#t.datetime“created_at”,:null=>false
#t.datetime“updated_at”,:null=>false
#结束
#添加索引“web\u caches”[“key”],:name=>“index\u web\u caches\u on\u key”,:unique=>true
#
类WebCacheserialized_key)。where(“expires_at>?”,expired_previor_to))存在?
#缓存命中
r、 第一,价值
其他的
#缓存未命中
yield.tap{| value | self.create!(:key=>serialized_key,:value=>value,:expires_at=>expires_at)}
结束
结束
#删除过期的条目。通常由cron作业调用。
def self.delete_expired_条目(expired_previor_to=Time.zone.now)
self.where(“过期时间<?”,过期时间早于)。销毁所有
结束
结束
是否有机会分享您的WebCache课程或模块?“这是我需要发明的一个轮子。”马丁卡波迪奇:把它添加到答案中。(这么短,似乎比写要点容易。)谢谢@fearless_傻瓜。同时,我编写了自己的一个,使用直接查询数据库作为存储,但是您的一个是改进我的一个有用的参考。
class CachingMechanize < Mechanize
def initialize(*args)
super
sanitize_scheme_handlers
end
def get(uri, parameters = [], referer = nil, headers = {})
WebCache.with_web_cache(uri.to_s) { super }
end
# private
def sanitize_scheme_handlers
scheme_handlers['http'] = SchemeHandler.new
scheme_handlers['https'] = scheme_handlers['http']
scheme_handlers['relative'] = scheme_handlers['http']
scheme_handlers['file'] = scheme_handlers['http']
end
class SchemeHandler
def call(link, page) ; link ; end
end
end
TypeError: allocator undefined for Proc
# Simple model for caching pages fetched from the web. Assumes
# a schema like this:
#
# create_table "web_caches", :force => true do |t|
# t.text "key"
# t.text "value"
# t.datetime "expires_at"
# t.datetime "created_at", :null => false
# t.datetime "updated_at", :null => false
# end
# add_index "web_caches", ["key"], :name => "index_web_caches_on_key", :unique => true
#
class WebCache < ActiveRecord::Base
serialize :value
# WebCache.with_web_cache(key) {
# ...body...
# }
#
# Searches the web_caches table for an entry with a matching key. If
# found, and if the entry has not expired, the value for that entry is
# returned. If not found, or if the entry has expired, yield to the
# body and cache the yielded value before returning it.
#
# Options:
# :expires_at sets the expiration date for this entry upon creation.
# Defaults to one year from now.
# :expired_prior_to overrides the value of 'now' when checking for
# expired entries. Mostly useful for unit testing.
#
def self.with_web_cache(key, opts = {})
serialized_key = YAML.dump(key)
expires_at = opts[:expires_at] || 1.year.from_now
expired_prior_to = opts[:expired_prior_to] || Time.zone.now
if (r = self.where(:key => serialized_key).where("expires_at > ?", expired_prior_to)).exists?
# cache hit
r.first.value
else
# cache miss
yield.tap {|value| self.create!(:key => serialized_key, :value => value, :expires_at => expires_at)}
end
end
# Prune expired entries. Typically called by a cron job.
def self.delete_expired_entries(expired_prior_to = Time.zone.now)
self.where("expires_at < ?", expired_prior_to).destroy_all
end
end