Ruby on rails ActiveRecord-复制联接表

Ruby on rails ActiveRecord-复制联接表,ruby-on-rails,ruby,postgresql,activerecord,Ruby On Rails,Ruby,Postgresql,Activerecord,为了演示,我正在运行一个rake任务来复制一些实时数据。其中一部分恰好需要很长时间才能运行。模型看起来像这样: class Host has_many :event_attendees, inverse_of: :host, dependent: :destroy has_many :events, -> { distinct }, through: :event_attendees end 我正在复制主机、与会者和事件,同时跟踪它们的ID(以复制现有的事件与会者)。这似乎运行良

为了演示,我正在运行一个rake任务来复制一些实时数据。其中一部分恰好需要很长时间才能运行。模型看起来像这样:

class Host
  has_many :event_attendees, inverse_of: :host, dependent: :destroy
  has_many :events, -> { distinct }, through: :event_attendees
end
我正在复制主机、与会者和事件,同时跟踪它们的ID(以复制现有的事件与会者)。这似乎运行良好

def copy_events
    log("Copying events...")

    @event_ids_h = {}

    ActiveRecord::Base.transaction do

      Event.
        where(organization_id: source_org_id).
        distinct.
        find_in_batches do |events|
          fake_events = events.map do |event|
            atts = event.attributes.except('id').merge(
              ...atts
            )
            Event.new(atts)
          end

          Event.import(
            fake_events,
            validate: true,
            timestamps: false,
          )

          @event_ids_h = @event_ids_h.merge(Hash[events.map(&:id).zip(fake_events.map(&:id))])
        end
    end
  end
下一部分工作。。。但它正在向前发展。在大约60万条记录上大约2.5小时

def copy_event_attendees
    log("Copying event attendee data...")

    arr = EventAttendee.
      where(attendee_id: @attendee_ids_h.keys, event_id: @event_ids_h.keys, host_id: @host_ids_h.keys).
      pluck(:attendee_id, :event_id, :host_id).
      map{ |ids| build_event_attendee_row(ids) }

    unless arr.empty?
      values_s = Arel::Nodes::ValuesList.new(arr).to_sql

      ActiveRecord::Base.connection.insert(<<~SQL)
        INSERT INTO event_attendees (attendee_id, event_id, host_id, created_at) #{values_s}
      SQL
    end

  end

def build_event_attendee_row(ids)
  ids[0] = @attendee_ids_h[ids.first]
  ids[1] = @event_ids_h[ids.second]
  ids[2] = @host_ids_h[ids.third]
  ids.push(Time.current)
end
def copy_event_与会者
日志(“正在复制事件与会者数据…”)
arr=EventAttendee。
其中(与会者id:@attendee_id_h.keys,事件id:@event_id_h.keys,主机id:@host_id_h.keys)。
采摘(:与会者id、:事件id、:主持人id)。
映射{ids{build|U event|U ATTENDER|U row(ids)}
除非是空的?
values\u s=Arel::Nodes::ValuesList.new(arr).to\u sql

ActiveRecord::Base.connection.insert(您在event\u Attendeers表上有任何数据库索引吗?此查询需要多长时间才能完成?您可以在日志文件中进行检查。尝试执行
EventAttendee.where(attendee\u id:@attendee\u id\u h.keys,event\u id:@event\u id\u h.keys,host\u id:@host\u id\u id\u h.keys).explain
查看查询执行计划的外观。@KamilGwódźdźyes在
attendee_id
host_id
(event_id,host_id,attendee_id)UNIQUE上有索引。
查询需要多长时间才能完成?可能是
映射{ids{build| event|U attendee|行(ids)}id)}
需要这么长时间吗?@KamilGwóźdź
build_event_attendee_row
会在mem中查找ID。但在运行脚本时,根据我跟踪的heroku日志,查询需要很长时间(约2.5小时)。我没有所有可用的id来手动查看查询计划,但是使用一些伪数组,它看起来像是这样的
使用Index_event_attendes_on_attendes_id on event_attendes(cost=0.08..8.39 rows=1 width=46)Index Cond:(attendee_id=ANY(“{9,8,7}”):bigint[])Filter:((event_id=ANY({3,2,1}::bigint[]))和(host_id=ANY({1,2,3})::integer[])
查询执行计划启发式,这取决于您拥有的数据,在您的小数据集索引中使用了,但对于大数组,您可以进行顺序扫描。但此查询不太可能需要2.5小时。您可以共享heroku日志吗?
class EventAttendee
  belongs_to :event, inverse_of: :event_attendees, dependent: :destroy
  belongs_to :attendee, inverse_of: :event_attendees, dependent: :destroy
  belongs_to :host, inverse_of: :event_attendees, dependent: :destroy
end
def copy_events
    log("Copying events...")

    @event_ids_h = {}

    ActiveRecord::Base.transaction do

      Event.
        where(organization_id: source_org_id).
        distinct.
        find_in_batches do |events|
          fake_events = events.map do |event|
            atts = event.attributes.except('id').merge(
              ...atts
            )
            Event.new(atts)
          end

          Event.import(
            fake_events,
            validate: true,
            timestamps: false,
          )

          @event_ids_h = @event_ids_h.merge(Hash[events.map(&:id).zip(fake_events.map(&:id))])
        end
    end
  end
def copy_event_attendees
    log("Copying event attendee data...")

    arr = EventAttendee.
      where(attendee_id: @attendee_ids_h.keys, event_id: @event_ids_h.keys, host_id: @host_ids_h.keys).
      pluck(:attendee_id, :event_id, :host_id).
      map{ |ids| build_event_attendee_row(ids) }

    unless arr.empty?
      values_s = Arel::Nodes::ValuesList.new(arr).to_sql

      ActiveRecord::Base.connection.insert(<<~SQL)
        INSERT INTO event_attendees (attendee_id, event_id, host_id, created_at) #{values_s}
      SQL
    end

  end

def build_event_attendee_row(ids)
  ids[0] = @attendee_ids_h[ids.first]
  ids[1] = @event_ids_h[ids.second]
  ids[2] = @host_ids_h[ids.third]
  ids.push(Time.current)
end