Ruby on rails ActiveRecord-复制联接表
为了演示,我正在运行一个rake任务来复制一些实时数据。其中一部分恰好需要很长时间才能运行。模型看起来像这样:Ruby on rails ActiveRecord-复制联接表,ruby-on-rails,ruby,postgresql,activerecord,Ruby On Rails,Ruby,Postgresql,Activerecord,为了演示,我正在运行一个rake任务来复制一些实时数据。其中一部分恰好需要很长时间才能运行。模型看起来像这样: class Host has_many :event_attendees, inverse_of: :host, dependent: :destroy has_many :events, -> { distinct }, through: :event_attendees end 我正在复制主机、与会者和事件,同时跟踪它们的ID(以复制现有的事件与会者)。这似乎运行良
class Host
has_many :event_attendees, inverse_of: :host, dependent: :destroy
has_many :events, -> { distinct }, through: :event_attendees
end
我正在复制主机、与会者和事件,同时跟踪它们的ID(以复制现有的事件与会者)。这似乎运行良好
def copy_events
log("Copying events...")
@event_ids_h = {}
ActiveRecord::Base.transaction do
Event.
where(organization_id: source_org_id).
distinct.
find_in_batches do |events|
fake_events = events.map do |event|
atts = event.attributes.except('id').merge(
...atts
)
Event.new(atts)
end
Event.import(
fake_events,
validate: true,
timestamps: false,
)
@event_ids_h = @event_ids_h.merge(Hash[events.map(&:id).zip(fake_events.map(&:id))])
end
end
end
下一部分工作。。。但它正在向前发展。在大约60万条记录上大约2.5小时
def copy_event_attendees
log("Copying event attendee data...")
arr = EventAttendee.
where(attendee_id: @attendee_ids_h.keys, event_id: @event_ids_h.keys, host_id: @host_ids_h.keys).
pluck(:attendee_id, :event_id, :host_id).
map{ |ids| build_event_attendee_row(ids) }
unless arr.empty?
values_s = Arel::Nodes::ValuesList.new(arr).to_sql
ActiveRecord::Base.connection.insert(<<~SQL)
INSERT INTO event_attendees (attendee_id, event_id, host_id, created_at) #{values_s}
SQL
end
end
def build_event_attendee_row(ids)
ids[0] = @attendee_ids_h[ids.first]
ids[1] = @event_ids_h[ids.second]
ids[2] = @host_ids_h[ids.third]
ids.push(Time.current)
end
def copy_event_与会者
日志(“正在复制事件与会者数据…”)
arr=EventAttendee。
其中(与会者id:@attendee_id_h.keys,事件id:@event_id_h.keys,主机id:@host_id_h.keys)。
采摘(:与会者id、:事件id、:主持人id)。
映射{ids{build|U event|U ATTENDER|U row(ids)}
除非是空的?
values\u s=Arel::Nodes::ValuesList.new(arr).to\u sql
ActiveRecord::Base.connection.insert(您在event\u Attendeers表上有任何数据库索引吗?此查询需要多长时间才能完成?您可以在日志文件中进行检查。尝试执行EventAttendee.where(attendee\u id:@attendee\u id\u h.keys,event\u id:@event\u id\u h.keys,host\u id:@host\u id\u id\u h.keys).explain
查看查询执行计划的外观。@KamilGwódźdźyes在attendee_id
、host_id
和(event_id,host_id,attendee_id)UNIQUE上有索引。
查询需要多长时间才能完成?可能是映射{ids{build| event|U attendee|行(ids)}id)}
需要这么长时间吗?@KamilGwóźdźbuild_event_attendee_row
会在mem中查找ID。但在运行脚本时,根据我跟踪的heroku日志,查询需要很长时间(约2.5小时)。我没有所有可用的id来手动查看查询计划,但是使用一些伪数组,它看起来像是这样的使用Index_event_attendes_on_attendes_id on event_attendes(cost=0.08..8.39 rows=1 width=46)Index Cond:(attendee_id=ANY(“{9,8,7}”):bigint[])Filter:((event_id=ANY({3,2,1}::bigint[]))和(host_id=ANY({1,2,3})::integer[])
查询执行计划启发式,这取决于您拥有的数据,在您的小数据集索引中使用了,但对于大数组,您可以进行顺序扫描。但此查询不太可能需要2.5小时。您可以共享heroku日志吗?
class EventAttendee
belongs_to :event, inverse_of: :event_attendees, dependent: :destroy
belongs_to :attendee, inverse_of: :event_attendees, dependent: :destroy
belongs_to :host, inverse_of: :event_attendees, dependent: :destroy
end
def copy_events
log("Copying events...")
@event_ids_h = {}
ActiveRecord::Base.transaction do
Event.
where(organization_id: source_org_id).
distinct.
find_in_batches do |events|
fake_events = events.map do |event|
atts = event.attributes.except('id').merge(
...atts
)
Event.new(atts)
end
Event.import(
fake_events,
validate: true,
timestamps: false,
)
@event_ids_h = @event_ids_h.merge(Hash[events.map(&:id).zip(fake_events.map(&:id))])
end
end
end
def copy_event_attendees
log("Copying event attendee data...")
arr = EventAttendee.
where(attendee_id: @attendee_ids_h.keys, event_id: @event_ids_h.keys, host_id: @host_ids_h.keys).
pluck(:attendee_id, :event_id, :host_id).
map{ |ids| build_event_attendee_row(ids) }
unless arr.empty?
values_s = Arel::Nodes::ValuesList.new(arr).to_sql
ActiveRecord::Base.connection.insert(<<~SQL)
INSERT INTO event_attendees (attendee_id, event_id, host_id, created_at) #{values_s}
SQL
end
end
def build_event_attendee_row(ids)
ids[0] = @attendee_ids_h[ids.first]
ids[1] = @event_ids_h[ids.second]
ids[2] = @host_ids_h[ids.third]
ids.push(Time.current)
end