Sql ActiveRecord find_每个都与限额和订单相结合_Sql_Ruby On Rails_Activerecord

Sql ActiveRecord find_每个都与限额和订单相结合

sql ruby-on-rails activerecord

Sql ActiveRecord find_每个都与限额和订单相结合,sql,ruby-on-rails,activerecord,Sql,Ruby On Rails,Activerecord,我试图使用ActiveRecord的find\u each方法运行一个大约50000条记录的查询，但它似乎忽略了我的其他参数，如： Thing.active.order("created_at DESC").limit(50000).find_each {|t| puts t.id } 这里是在整个数据集上执行的结果查询，而不是在50000处停止并按创建的排序：有没有一种方法可以让查找每个获得类似的行为，但有一个总的最大限制并遵守我的排序标准？在引擎盖下使用无法选择记录的顺序，如“批次中的

我试图使用ActiveRecord的

find\u each

方法运行一个大约50000条记录的查询，但它似乎忽略了我的其他参数，如：

Thing.active.order("created_at DESC").limit(50000).find_each {|t| puts t.id }

这里是在整个数据集上执行的结果查询，而不是在50000处停止并按创建的

排序：
有没有一种方法可以让查找每个
获得类似的行为，但有一个总的最大限制并遵守我的排序标准？
在引擎盖下使用
无法选择记录的顺序，如“批次中的查找”中所述，在主键（“id ASC”）上自动设置为升序以使批次排序工作
但是，如果应用了标准，您可以做的是：
Thing.active.find_each(batch_size: 50000) { |t| puts t.id }

关于限制，尚未实施：

回答第二个问题，您可以自己创建逻辑：
total_records = 50000
batch = 1000
(0..(total_records - batch)).step(batch) do |i|
  puts Thing.active.order("created_at DESC").offset(i).limit(batch).to_sql
end

表示find_each和find_in_批次不保留排序顺序和限制，因为：

PK上的分拣ASC用于进行批量订购
限制用于控制批量大小

您可以像@rorra那样编写自己版本的函数。但是，当对象发生变异时，您可能会遇到麻烦。例如，如果按创建的对象排序并保存该对象，它可能会在下一批中再次出现。类似地，您可能会跳过对象，因为在执行查询以获取下一批时，结果的顺序已更改。仅对只读对象使用该解决方案
现在我主要担心的是我不想一次将30000多个对象加载到内存中。我关心的不是查询本身的执行时间。因此，我使用了一个执行原始查询但只缓存ID的解决方案。然后，它将ID数组划分为块，并查询/创建每个块的对象。通过这种方式，您可以安全地对对象进行变异，因为排序顺序保留在内存中
下面是一个与我所做的类似的最小示例：
batch_size = 512
ids = Thing.order('created_at DESC').pluck(:id) # Replace .order(:created_at) with your own scope
ids.each_slice(batch_size) do |chunk|
    Thing.find(chunk, :order => "field(id, #{chunk.join(',')})").each do |thing|
      # Do things with thing
    end
end

此解决方案的权衡是：

执行完整的查询以获取ID的
所有ID的数组都保存在内存中
使用MySQL特定的FIELD（）函数

希望这有帮助
 我在寻找同样的行为，并想出了这个解决方案。这不是由创建的顺序，但我想我无论如何都会发布
max_records_to_retrieve = 50000
last_index = Thing.count
start_index = [(last_index - max_records_to_retrieve), 0].max
Thing.active.find_each(:start => start_index) do |u|
    # do stuff
end

这种方法的缺点：
-您需要2个查询（第一个应该很快）
-这保证了最多50K条记录，但如果跳过ID，您将得到更少的记录。
首先检索ID
，然后在组中处理
ordered_photo_ids = Photo.order(likes_count: :desc).pluck(:id)

ordered_photo_ids.in_groups_of(1000, false).each do |photo_ids|
  photos = Photo.order(likes_count: :desc).where(id: photo_ids)

  # ...
end

将ORDER BY
查询添加到内部调用中也很重要。
一个选项是将为您的特定模型定制的实现放入模型本身（说到这一点，id
通常是订购记录的更好选择，在处创建的\u可能有重复项）：
config/initializers/extensions.rb
：
ActiveRecord::Batches.module_eval do
  def find_each_desc limit
    batch_size = 1000
    i = 1
    records = self.order(id: :desc).limit(batch_size)
    while records.any?
      records.each do |task|
        yield task, i
        i += 1
        return if i > limit
      end
      records = self.order(id: :desc).where('id < ?', records.last.id).limit(batch_size)
    end
  end
end

ActiveRecord::Querying.module_eval do
  delegate :find_each_desc, :to => :all
end

require "active_record_extensions"

另外，我将根据将代码放入文件。
您可以使用标准ruby迭代器向后迭代：
Thing.last.id.step(0,-1000) do |i|
  Thing.where(id: (i-1000+1)..i).order('id DESC').each do |thing|
    #...
  end
end

注意：+1
是因为查询中的两个边界之间包含两个边界，但我们只需要包含一个边界
当然，使用这种方法可以批量获取不到1000条记录，因为其中一些记录已被删除，但在我的情况下，这是可以的。
您可以尝试Gem
从他们那里你可以做这样的事情
Users.where(country_id: 44).order(:joined_at).offset(200).as_batches do |user|
  user.party_all_night!
end

在一个查询中执行此操作并避免迭代：
User.offset（2）.order（'name DESC'）。last（3）

生成这样的查询吗
Users.where(country_id: 44).order(:joined_at).offset(200).as_batches do |user|
  user.party_all_night!
end

选择“用户”。*从“用户”订单中按名称ASC限额$1抵销$2[[“限额”，3]，“抵销”，2]
使用或其他方法将很容易
创建批处理加载程序类。
创建存储库
使用存储库
正如@Kirk在其中一条评论中所指出的，find_每个
都支持limit
版本
更改日志中的示例：
Post.limit(10_000).find_each do |post|
  # ...
end

报告说：
遵守限制，如果存在，则不要求批量大小：可以小于、等于或大于限制
（设置自定义订单仍然不受支持）
使用订单在批次中添加find\u确实解决了我的用例，我已经有了ID，但需要进行批处理和排序。它的灵感来自@dirk geurs解决方案
# Create file config/initializers/find_in_batches_with_order.rb with follwing code.
ActiveRecord::Batches.class_eval do
  ## Only flat order structure is supported now
  ## example: [:forename, :surname] is supported but [:forename, {surname: :asc}] is not supported
  def find_in_batches_with_order(ids: nil, order: [], batch_size: 1000)
    relation = self
    arrangement = order.dup
    index = order.find_index(:id)

    unless index
      arrangement.push(:id)
      index = arrangement.length - 1
    end

    ids ||= relation.order(*arrangement).pluck(*arrangement).map{ |tupple| tupple[index] }
    ids.each_slice(batch_size) do |chunk_ids|
      chunk_relation = relation.where(id: chunk_ids).order(*order)
      yield(chunk_relation)
    end
  end
end

将Gist留在这里
我遇到了同样的问题，在
上使用DISTINCT进行查询，您需要该字段的ORDER BY
，因此这是我对Postgres的方法：
def filtered_model_ids
  Model.joins(:father_model)
       .select('DISTINCT ON (model.field) model.id')
       .order(:field)
       .map(&:id)
end

def processor
  filtered_model_ids.each_slice(BATCH_SIZE).lazy.each do |batch|
    Model.find(batch).each do |record|
      # Code
    end
  end
end

Rails 6.1在find_each
、find_in_batches
和in_batches
中增加了降序功能，是否有其他方法可以实现这一点？@jan hettich，我在最初的回答中写道find_in_batches不支持限制选项，我还指出了实现该选项的pull请求，但它从来都不是一个选项已接受/合并。如果在处理批处理时对对象进行变异，则此解决方案会给您带来麻烦。如果变异对数据库中的排序顺序有影响，您可能会跳过某些对象或进行加倍。total\u records-batch
可能小于批处理
大小，这将是一个负范围。我可以llabs
，以确保计算结果至少重复一次：例如（0..（总记录-批次）.abs）
，当总记录
不是批次
（即使是）的倍数时，为了不错过最后一批，您的范围应为（0..（总记录-1））。有什么特别的原因你还没有接受任何答案吗？对不起，我忘了：-\n在像find\u each这样的批处理操作中，find\u In\u批处理范围的顺序和限制被忽略，它被强制为批处理顺序和批处理大小。就像接受的答案一样，这在PostgreSQL中工作。另外，很好地保持了答案的简洁性。这就开始了
repo = ThingRepository.new
repo.batch_changes(5000).each do |g|
  g.each do |t|
    #...
  end
end

Post.limit(10_000).find_each do |post|
  # ...
end

# Create file config/initializers/find_in_batches_with_order.rb with follwing code.
ActiveRecord::Batches.class_eval do
  ## Only flat order structure is supported now
  ## example: [:forename, :surname] is supported but [:forename, {surname: :asc}] is not supported
  def find_in_batches_with_order(ids: nil, order: [], batch_size: 1000)
    relation = self
    arrangement = order.dup
    index = order.find_index(:id)

    unless index
      arrangement.push(:id)
      index = arrangement.length - 1
    end

    ids ||= relation.order(*arrangement).pluck(*arrangement).map{ |tupple| tupple[index] }
    ids.each_slice(batch_size) do |chunk_ids|
      chunk_relation = relation.where(id: chunk_ids).order(*order)
      yield(chunk_relation)
    end
  end
end

def filtered_model_ids
  Model.joins(:father_model)
       .select('DISTINCT ON (model.field) model.id')
       .order(:field)
       .map(&:id)
end

def processor
  filtered_model_ids.each_slice(BATCH_SIZE).lazy.each do |batch|
    Model.find(batch).each do |record|
      # Code
    end
  end
end