Ruby on rails 3 编写Mechanize的缓存版本_Ruby On Rails 3_Activerecord_Mechanize

Ruby on rails 3 编写Mechanize的缓存版本

ruby-on-rails-3 activerecord

Ruby on rails 3 编写Mechanize的缓存版本,ruby-on-rails-3,activerecord,mechanize,Ruby On Rails 3,Activerecord,Mechanize,我想要一个缓存版本的Mechanize。其思想是#get（uri…）检查该uri之前是否已被获取，如果是，则从缓存中获取响应，而不是访问web。如果不在缓存中，它将访问web并将响应保存在缓存中我天真的方法行不通。（我可能不需要提及CachedWebPage是ActiveRecord:：Base的一个子类）： class CachingMechanizeuri，：contents=>contents）} 结束如果给定块，则返回页面？页结束结束这会失败，因为Mechanize#get

我想要一个缓存版本的Mechanize。其思想是#get（uri…）检查该uri之前是否已被获取，如果是，则从缓存中获取响应，而不是访问web。如果不在缓存中，它将访问web并将响应保存在缓存中

我天真的方法行不通。（我可能不需要提及CachedWebPage是ActiveRecord:：Base的一个子类）：

class CachingMechanizeuri，：contents=>contents）}
结束
如果给定块，则返回页面？
页
结束
结束

这会失败，因为Mechanize#get（）返回的对象是一个复杂的循环结构，YAML和JSON都不希望序列化以存储到数据库中

我意识到我想要的是在Mechanize解析低级内容之前捕获它

有没有干净的方法可以做到这一点？我想我可以使用Mechanize的post_connect钩子来访问传入的原始页面，但我不知道如何随后将缓存的原始页面传递给Mechanize进行解析
是否有一些包我应该使用的网页缓存已经

结果表明，解决方案很简单，但并不完全干净。缓存Mechanize#get（）的结果很简单，如下所示：

class CachingMechanize < Mechanize
  def get(uri, parameters = [], referer = nil, headers = {})
    WebCache.with_web_cache(uri.to_s) { super }
  end
end

检查您尝试序列化和反序列化的对象中是否存在lambda或proc。如果您能够（就像我在本例中所做的那样）用对象的方法调用替换lambda，那么您应该能够解决这个问题

希望这对其他人有帮助

更新针对@Martin关于WebCache定义的请求，以下是：

# Simple model for caching pages fetched from the web.  Assumes
# a schema like this:
#
#   create_table "web_caches", :force => true do |t|
#     t.text     "key"
#     t.text     "value"
#     t.datetime "expires_at"
#     t.datetime "created_at", :null => false
#     t.datetime "updated_at", :null => false
#   end
#   add_index "web_caches", ["key"], :name => "index_web_caches_on_key", :unique => true
#
class WebCache < ActiveRecord::Base
  serialize :value

  # WebCache.with_web_cache(key) {
  #    ...body...
  # }
  #
  # Searches the web_caches table for an entry with a matching key.  If
  # found, and if the entry has not expired, the value for that entry is
  # returned.  If not found, or if the entry has expired, yield to the
  # body and cache the yielded value before returning it.
  #
  # Options:
  #   :expires_at sets the expiration date for this entry upon creation.
  #               Defaults to one year from now.
  #   :expired_prior_to overrides the value of 'now' when checking for
  #                     expired entries.  Mostly useful for unit testing.
  #
  def self.with_web_cache(key, opts = {})
    serialized_key = YAML.dump(key)
    expires_at = opts[:expires_at] || 1.year.from_now
    expired_prior_to = opts[:expired_prior_to] || Time.zone.now
    if (r = self.where(:key => serialized_key).where("expires_at > ?", expired_prior_to)).exists?
      # cache hit
      r.first.value
    else
      # cache miss
      yield.tap {|value| self.create!(:key => serialized_key, :value => value, :expires_at => expires_at)}
    end
  end

  # Prune expired entries.  Typically called by a cron job.
  def self.delete_expired_entries(expired_prior_to = Time.zone.now)
    self.where("expires_at < ?", expired_prior_to).destroy_all
  end

end

#用于缓存从web获取的页面的简单模型。假设
#类似这样的模式：
#
#创建表格“web缓存”：force=>true do | t|
#文本“键”
#文本“值”
#t.datetime“到期时间”
#t.datetime“created_at”，：null=>false
#t.datetime“updated_at”，：null=>false
#结束
#添加索引“web\u caches”[“key”]，：name=>“index\u web\u caches\u on\u key”，：unique=>true
#
类WebCacheserialized_key）。where（“expires_at>？”，expired_previor_to））存在？
#缓存命中
r、 第一，价值
其他的
#缓存未命中
yield.tap{| value | self.create！（：key=>serialized_key，：value=>value，：expires_at=>expires_at）}
结束
结束
#删除过期的条目。通常由cron作业调用。
def self.delete_expired_条目（expired_previor_to=Time.zone.now）
self.where（“过期时间<？”，过期时间早于）。销毁所有
结束
结束

是否有机会分享您的WebCache课程或模块？“这是我需要发明的一个轮子。”马丁卡波迪奇：把它添加到答案中。（这么短，似乎比写要点容易。）谢谢@fearless_傻瓜。同时，我编写了自己的一个，使用直接查询数据库作为存储，但是您的一个是改进我的一个有用的参考。

class CachingMechanize < Mechanize

  def initialize(*args)
    super
    sanitize_scheme_handlers
  end

  def get(uri, parameters = [], referer = nil, headers = {})
    WebCache.with_web_cache(uri.to_s) { super }
  end

  # private

  def sanitize_scheme_handlers
    scheme_handlers['http']      = SchemeHandler.new
    scheme_handlers['https']     = scheme_handlers['http']
    scheme_handlers['relative']  = scheme_handlers['http']
    scheme_handlers['file']      = scheme_handlers['http']
  end

  class SchemeHandler
    def call(link, page) ; link ; end
  end

end

TypeError: allocator undefined for Proc

# Simple model for caching pages fetched from the web.  Assumes
# a schema like this:
#
#   create_table "web_caches", :force => true do |t|
#     t.text     "key"
#     t.text     "value"
#     t.datetime "expires_at"
#     t.datetime "created_at", :null => false
#     t.datetime "updated_at", :null => false
#   end
#   add_index "web_caches", ["key"], :name => "index_web_caches_on_key", :unique => true
#
class WebCache < ActiveRecord::Base
  serialize :value

  # WebCache.with_web_cache(key) {
  #    ...body...
  # }
  #
  # Searches the web_caches table for an entry with a matching key.  If
  # found, and if the entry has not expired, the value for that entry is
  # returned.  If not found, or if the entry has expired, yield to the
  # body and cache the yielded value before returning it.
  #
  # Options:
  #   :expires_at sets the expiration date for this entry upon creation.
  #               Defaults to one year from now.
  #   :expired_prior_to overrides the value of 'now' when checking for
  #                     expired entries.  Mostly useful for unit testing.
  #
  def self.with_web_cache(key, opts = {})
    serialized_key = YAML.dump(key)
    expires_at = opts[:expires_at] || 1.year.from_now
    expired_prior_to = opts[:expired_prior_to] || Time.zone.now
    if (r = self.where(:key => serialized_key).where("expires_at > ?", expired_prior_to)).exists?
      # cache hit
      r.first.value
    else
      # cache miss
      yield.tap {|value| self.create!(:key => serialized_key, :value => value, :expires_at => expires_at)}
    end
  end

  # Prune expired entries.  Typically called by a cron job.
  def self.delete_expired_entries(expired_prior_to = Time.zone.now)
    self.where("expires_at < ?", expired_prior_to).destroy_all
  end

end