Ruby on rails CSV在Rails中导入-UTF-8中包含非英语字符的字节序列无效

Ruby on rails CSV在Rails中导入-UTF-8中包含非英语字符的字节序列无效,ruby-on-rails,ruby,csv,import,rubygems,Ruby On Rails,Ruby,Csv,Import,Rubygems,我正在使用将CSV文件中的一些记录导入Rails 3模型。(我使用这个宝石是因为我发现这是最简单的方法) 无论如何,我用于导入记录的代码如下所示: r = import('doc/socios_full.csv') do map_to Associate after_row lambda{|row, associate| associate.save } start_at_row 1 [group,member,family_relationship_code,f

我正在使用将CSV文件中的一些记录导入Rails 3模型。(我使用这个宝石是因为我发现这是最简单的方法)

无论如何,我用于导入记录的代码如下所示:

r = import('doc/socios_full.csv') do
    map_to Associate
    after_row lambda{|row, associate| associate.save }
    start_at_row 1
    [group,member,family_relationship_code,family_relationship_description,last_name,names,...]
#The previous line is actually longer, with more atts, but it's been cut to explain the example
end

它工作得非常好,除非解析器遇到一些非英语字符,比如
ó,é,ñ,í,°…
。这时我得到以下错误:

ArgumentError: invalid byte sequence in UTF-8
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1831:in `sub!'
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1831:in `block in shift'
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1825:in `loop'
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1825:in `shift'
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1767:in `each'
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/csv-mapper-0.5.1/lib/csv-mapper.rb:106:in `each_with_index'
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/csv-mapper-0.5.1/lib/csv-mapper.rb:106:in `import'
    from (irb):63
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands/console.rb:44:in `start'
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands/console.rb:8:in `start'
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands.rb:23:in `<top (required)>'
    from script/rails:6:in `require'
    from script/rails:6:in `<main>'
ArgumentError:UTF-8中的字节序列无效
from/home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1831:in'sub!'
from/home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1831:在“轮班时的块”中
from/home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1825:in'loop'
from/home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1825:in'shift'
from/home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1767:in'each'
from/home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/csv-mapper-0.5.1/lib/csv-mapper.rb:106:在“带有索引的每个”中
from/home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/csv-mapper-0.5.1/lib/csv-mapper.rb:106:在“导入”中
来自(irb):63
from/home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands/console.rb:44:在“开始”中
from/home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands/console.rb:8:in'start'
from/home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands.rb:23:in`'
来自脚本/rails:6:in'require'
来自脚本/rails:6:in`'
我很确定这一点,因为如果我替换所有这些字符,问题就会消失,直到解析器找到另一个非英语字符。问题是我有一个50k的记录文件,所以每次搜索我能想到的每个字符并试图导入所有这些记录都是非常耗时的


有没有一种方法可以忽略这些错误并允许解析器继续?或者有更简单的方法导入此CSV文件吗?

也许,您可以尝试以下方法:

csv_string.force_encoding('ISO-8859-1')
CSV.foreach(filename, :headers => true , :encoding => 'ISO-8859-1') do |row|

使用不同的方法解决了这个问题,这是一个比使用外部gem更容易将CSV文件导入Rails 3模型的解决方案:

    require 'csv'
    CSV.foreach('doc/socios_full.csv') do |row|
        record = Associate.new(
            :media_format   => row[0], 
            :group => row[0],
            :member => row[1],
            :family_relationship_code => row[2],
            :family_relationship_description => row[3],
            :last_name => row[4],
            :names => row[5],
            ...
        )
        record.save!
    end
它可以完美地工作,即使是非英语字符(只是尝试了一个75k的导入文件!)。希望对某人有所帮助。

这样做:

csv_string.force_encoding('ISO-8859-1')
CSV.foreach(filename, :headers => true , :encoding => 'ISO-8859-1') do |row|

我在尝试读取通过MS Excel保存的CSV文件时遇到了同样的问题。您可以将编码指定为一个选项。我猜默认情况下它采用UTF-8。

如果您确信CSV将包含正确的标头名称,则以下方法应适用于任何模型:

  def self.import(file)
    CSV.foreach(file.path, headers: true) do |row|
      obj = self.new
      obj.attributes.each_key do |attribute|
        index = row.headers.index(attribute)
        obj.send("#{attribute}=",row[index]) if index
      end
      obj.save
    end
  end

您知道您的CSV文件是如何编码的,或者它最初来自哪里吗?是的,它来自一家西班牙组织clientes export。我已经解决了!不可能,“r”是一个数组,所以强制编码方法不起作用。很好,我非常喜欢“用手来做,然后让它运行”的方法。我完全同意。此外,外部宝石越少越好。(我还在学习ruby和rails,所以我每天都努力多学一点)