Ruby on rails Rails规范化csv文件数据
我正在尝试将tsv(制表符分隔数据)文件导入到我的数据库中,但它的格式不正确。列Ruby on rails Rails规范化csv文件数据,ruby-on-rails,ruby,csv,database-normalization,Ruby On Rails,Ruby,Csv,Database Normalization,我正在尝试将tsv(制表符分隔数据)文件导入到我的数据库中,但它的格式不正确。列price和count仅用空格分隔(标题行除外),值都放在price键中,将所有数据移动到错误的键值对中 tsv文件: purchaser name item description price count merchant address merchant name Alice Bob $10 off $20 of food 10.0 2 987 Fake St Bob's P
price
和count
仅用空格分隔(标题行除外),值都放在price
键中,将所有数据移动到错误的键值对中
tsv文件:
purchaser name item description price count merchant address merchant name
Alice Bob $10 off $20 of food 10.0 2 987 Fake St Bob's Pizza
Example Name $30 of awesome for $10 10.0 5 456 Unreal Rd Tom's Awesome Shop
Name Three $20 Sneakers for $5 5.0 1 123 Fake St Sneaker Store Emporium
John Williams $20 Sneakers for $5 5.0 4 123 Fake St Sneaker Store Emporium
在/models/purchase.rb
中:
class Purchase < ActiveRecord::Base
# validates :item_price, :numericality => { :greater_than_or_equal_to => 0 }
def self.import(file)
CSV.foreach(file.path, :headers => true,
:header_converters => lambda { |h| h.downcase.gsub(' ', '_')},
:col_sep => "\t"
) do |row|
# debugger
purchase_hash = row.to_hash
Purchase.create!(purchase_hash)
end
end
end
ActiveRecord::Schema.define(version: 20160601205154) do
create_table "purchases", force: :cascade do |t|
t.string "purchaser_name"
t.string "item_description"
t.string "price"
t.string "count"
t.string "merchant_address"
t.string "merchant_name"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
end
我最初将
price
作为十进制数据类型,将count
作为整数,但将它们切换回字符串以尝试找到解决方案。如果有帮助的话,我可以将它们更改回去(如果可能的话,我更愿意将它们更改回去)您可以尝试移动商户地址和商户名称值,然后将压扁的价格和计数文件拆分为一个空格,并将这两个值分配为价格和计数:
purchase_hash = row.to_hash
purchase_hash[:merchant_name] = purchase_hash[:merchant_address]
purchase_hash[:merchant_address] = purchase_hash[:count]
splitted_price_count = purchase_hash[:price].split(" ")
purchase_hash[:price] = splitted_price_count.first
purchase_hash[:count] = splitted_price_count.last
Purchase.create!(purchase_hash)
解决这个问题有两个办法。首先,定义一个转换器,该转换器将在解析期间将字段拆分为两部分(并在过程中将其转换为数字):
CONVERTER_SPLIT_PRICE_COUNT = lambda do |value, info|
next value unless info.header == "price"
price, count = value.split
[ price.to_f, count.to_i ]
end
这将price
字段变成一个数组,例如“10.02”
变成[10.0,2]
其次,定义一个方法,在解析后,该方法将修复错误放置的值并返回正确的哈希:
def row_to_hash_fixing_price_count(row)
row.headers.zip(row.fields.flatten).to_h
end
上面将price/count数组展平为其父数组(行的其余部分),然后使用headers数组将其压缩。由于现在字段比标题多,因此末尾的额外nil
被删除
您将这样使用它们:
csv_opts = {
headers: true,
col_sep: "\t",
header_converters: ->(h) { h.downcase.tr(" ", "_") },
converters: CONVERTER_SPLIT_PRICE_COUNT
}
data_out = CSV.new(data, csv_opts).map do |row|
row_to_hash_fixing_price_count(row)
end
# => [ { "purchaser_name" => "Alice Bob",
# "item_description" => "$10 off $20 of food",
# "price" => 10.0,
# "count" => 2,
# "merchant_address" => "987 Fake St",
# "merchant_name" => "Bob's Pizza"
# },
# # ...
# ]
你可以在这里看到它的作用:
考虑批量创建记录,而不是一次创建一个记录。鉴于上述情况,您只需执行Purchase.create!(数据输出)
自