Ruby 英镑符号£；导致PG:：CharacterNotInRepertoire:错误：编码的字节序列无效“；UTF8”；：0xa3_Ruby_Postgresql_Ruby On Rails 4_Encoding_Utf 8

Ruby 英镑符号£；导致PG:：CharacterNotInRepertoire:错误：编码的字节序列无效“；UTF8”；：0xa3

ruby postgresql ruby-on-rails-4 encoding utf-8

Ruby 英镑符号£；导致PG:：CharacterNotInRepertoire:错误：编码的字节序列无效“；UTF8”；：0xa3,ruby,postgresql,ruby-on-rails-4,encoding,utf-8,Ruby,Postgresql,Ruby On Rails 4,Encoding,Utf 8,当通过csv文件从外部来源（如我的银行）收集包含英镑符号“£”的信息，并使用ActiveRecord发布到postgres时，我得到错误： PG:：CharacterNotInRepertoire:错误：编码“UTF8”的字节序列无效：0xa3 0xa3是符号的十六进制代码。明智的做法是在字符串上明确指定UTF-8，同时替换无效的字节序列 string.encode('UTF-8', {:invalid => :replace, :undef => :replace, :replac

当通过csv文件从外部来源（如我的银行）收集包含英镑符号“£”的信息，并使用ActiveRecord发布到postgres时，我得到错误：

PG:：CharacterNotInRepertoire:错误：编码“UTF8”的字节序列无效：0xa3

0xa3是符号的十六进制代码。明智的做法是在字符串上明确指定UTF-8，同时替换无效的字节序列

string.encode('UTF-8', {:invalid => :replace, :undef => :replace, :replace => '?'})

这会停止错误，但在将“£”转换为“？”时是有损修复

UTF-8能够处理“镑”符号，那么可以做些什么来修复无效的字节序列并保留“镑”符号呢？

我要回答我自己的问题，这要感谢Michael Fuhr，他解释了磅符号是0xc2 0xa3的原因。所以，您所要做的就是找到0xa3（163）的每个匹配项，并将0xc2（194）放在它前面

array_bytes = string.bytes
new_pound_ptr = 0
# Look for £ sign 
pound_ptr = array_bytes.index(163)
while !pound_ptr.nil?
  pound_ptr+= new_pound_ptr # new_pound_ptr is set at end of block
  # The following statement finds incorrectly sequenced £ sign...
  if (pound_ptr == 0) || (array_bytes[pound_ptr-1] != 194)
    array_bytes.insert(pound_ptr,194)
      pound_ptr+= 1
    end
    new_pound_ptr = pound_ptr
    # Search remainder of array for pound sign
    pound_ptr = array_bytes[(new_pound_ptr+1)..-1].index(163)
  end
end
# Convert bytes to 8-bit unsigned char, and UTF-8
string = array_bytes.pack('C*').force_encoding('UTF-8') unless new_pound_ptr == 0
# Can now write string to model without out-of-sequence error..
hash["description"] = string
Model.create!(hash)

在这个stackoverflow论坛上，我得到了很多帮助，我希望我帮助了其他人。

我要回答我自己的问题，感谢Michael Fuhr，他解释了英镑符号的符号是0xc2 0xa3。所以，您所要做的就是找到0xa3（163）的每个匹配项，并将0xc2（194）放在它前面

array_bytes = string.bytes
new_pound_ptr = 0
# Look for £ sign 
pound_ptr = array_bytes.index(163)
while !pound_ptr.nil?
  pound_ptr+= new_pound_ptr # new_pound_ptr is set at end of block
  # The following statement finds incorrectly sequenced £ sign...
  if (pound_ptr == 0) || (array_bytes[pound_ptr-1] != 194)
    array_bytes.insert(pound_ptr,194)
      pound_ptr+= 1
    end
    new_pound_ptr = pound_ptr
    # Search remainder of array for pound sign
    pound_ptr = array_bytes[(new_pound_ptr+1)..-1].index(163)
  end
end
# Convert bytes to 8-bit unsigned char, and UTF-8
string = array_bytes.pack('C*').force_encoding('UTF-8') unless new_pound_ptr == 0
# Can now write string to model without out-of-sequence error..
hash["description"] = string
Model.create!(hash)

在这个stackoverflow论坛上我得到了很多帮助，我希望我帮助了其他人。

0xa3是microsuft的cp1252（和iso8859-1）中英镑符号的代码点。您的数据可能没有编码为utf8。您是对的@wildplasser，源文件有Microsoft编码-一个扩展名为.xls的HTML文件下载。Ruby将其处理为UTF-8，而英镑符号之前没有正确的字符序列。0xa3是microsuft的cp1252（和iso8859-1）中英镑符号的代码点。您的数据可能没有编码为utf8。您是对的@wildplasser，源文件有Microsoft编码-一个扩展名为.xls的HTML文件下载。Ruby将其处理为UTF-8，除了“%”符号之外，该符号前面没有正确的字符序列。