Ruby on rails 如何将大量数据导入Rails?

Ruby on rails 如何将大量数据导入Rails?,ruby-on-rails,Ruby On Rails,为了加载少量数据,我一直在使用rake任务将重要数据从CSV加载到Rails中: desc "Import users." task :import_users => :environment do File.open("users.txt", "r").each do |line| name, age, profession = line.strip.split("\t") u = User.new(:name => name, :age

为了加载少量数据,我一直在使用rake任务将重要数据从CSV加载到Rails中:

desc "Import users." 
task :import_users => :environment do
    File.open("users.txt", "r").each do |line|
        name, age, profession = line.strip.split("\t")
        u = User.new(:name => name, :age => age, :profession => profession)
        u.save
    end
end

但是,对于较大的文件(大约50000条记录),这是非常缓慢的。有没有更快的方法导入数据?

您可能需要查看并签出。

您可能需要查看并签出。

无需额外的库(我同意使用AR扩展的批量导入应该更快)(尽管AR:Extension跳过模型验证)您可以添加一点并发性并利用多核机器

# Returns the number of processor for Linux, OS X or Windows.
def number_of_processors
  if RUBY_PLATFORM =~ /linux/
    return `cat /proc/cpuinfo | grep processor | wc -l`.to_i
  elsif RUBY_PLATFORM =~ /darwin/
    return `sysctl -n hw.logicalcpu`.to_i
  elsif RUBY_PLATFORM =~ /win32/
    # this works for windows 2000 or greater
    require 'win32ole'
    wmi = WIN32OLE.connect("winmgmts://")
    wmi.ExecQuery("select * from Win32_ComputerSystem").each do |system| 
      begin
        processors = system.NumberOfLogicalProcessors
      rescue
        processors = 0
      end
      return [system.NumberOfProcessors, processors].max
    end
  end
  raise "can't determine 'number_of_processors' for '#{RUBY_PLATFORM}'"
end

desc "Import users." 
task :fork_import_users => :environment do
  procs = number_of_processors
  lines = IO.readlines('user.txt')
  nb_lines = lines.size
  slices = nb_lines / procs
  procs.times do
    subset = lines.slice!(0..slices)
    fork do
      subset.each do |line|
        name, age, profession = line.strip.split("\t")
        u = User.new(:name => name, :age => age, :profession => profession)
        u.save
      end
    end
  end
  Process.waitall
end
在我的机器上,有两个内核和fork版本

real    1m41.974s
user    1m32.629s
sys     0m7.318s
使用您的版本时:

real    2m56.401s
user    1m21.953s
sys     0m7.529s
如果没有额外的库(我同意使用AR扩展的批量导入应该更快)(尽管AR:Extension跳过了模型验证),您可以添加一点并发性并利用多核机器

# Returns the number of processor for Linux, OS X or Windows.
def number_of_processors
  if RUBY_PLATFORM =~ /linux/
    return `cat /proc/cpuinfo | grep processor | wc -l`.to_i
  elsif RUBY_PLATFORM =~ /darwin/
    return `sysctl -n hw.logicalcpu`.to_i
  elsif RUBY_PLATFORM =~ /win32/
    # this works for windows 2000 or greater
    require 'win32ole'
    wmi = WIN32OLE.connect("winmgmts://")
    wmi.ExecQuery("select * from Win32_ComputerSystem").each do |system| 
      begin
        processors = system.NumberOfLogicalProcessors
      rescue
        processors = 0
      end
      return [system.NumberOfProcessors, processors].max
    end
  end
  raise "can't determine 'number_of_processors' for '#{RUBY_PLATFORM}'"
end

desc "Import users." 
task :fork_import_users => :environment do
  procs = number_of_processors
  lines = IO.readlines('user.txt')
  nb_lines = lines.size
  slices = nb_lines / procs
  procs.times do
    subset = lines.slice!(0..slices)
    fork do
      subset.each do |line|
        name, age, profession = line.strip.split("\t")
        u = User.new(:name => name, :age => age, :profession => profession)
        u.save
      end
    end
  end
  Process.waitall
end
在我的机器上,有两个内核和fork版本

real    1m41.974s
user    1m32.629s
sys     0m7.318s
使用您的版本时:

real    2m56.401s
user    1m21.953s
sys     0m7.529s
你应该试试。对我来说,它非常快而且非常容易使用。

你应该试试。对我来说,它速度非常快而且非常容易使用。

ar扩展(以及它对Rails 3 activerecord导入的替代)不必跳过模型验证。根据您的需要和速度首选项,这是可选的。ar-extensions(及其对Rails 3 activerecord导入的替代品)不必跳过模型验证。根据您的需求和速度偏好,这是可选的。