Ruby 如何按选项卡分隔文件中的特定列对项目进行分组_Ruby

Ruby 如何按选项卡分隔文件中的特定列对项目进行分组

ruby

Ruby 如何按选项卡分隔文件中的特定列对项目进行分组,ruby,Ruby,我在选项卡分隔的文本文件中有以下记录： sku title Product Type 19686940 This is test Title1 toys 19686941 This is test Title2 toys 19686942 This is test Title3 toys

我在选项卡分隔的文本文件中有以下记录：

sku title   Product Type                        
19686940    This is test Title1 toys                        
19686941    This is test Title2 toys                        
19686942    This is test Title3 toys                        
20519300    This is test Title1 toys2                       
20519301    This is test Title2 toys2
20580987    This is test Title1 toys3                       
20580988    This is test Title2 toys3                       
20582176    This is test Title1 toys4

如何按

产品类型对项目进行分组，并在标题中找到所有唯一的单词

输出格式：
Product Type   Unique_words 
------------   ------------ 
toys           This is test Title1 Title2 Title3
toys2          This is test Title1 Title2
toys3          This is test Title1 Title2
toys4          This is test Title1

更新

到目前为止，我一直在编写代码，直到读取文件并存储到数组中：
class Product
    attr_reader :sku, :title, :productType
    def initialize(sku,title,productType)
      @sku = sku
      @title = title
      @productType = productType
    end

    def sku
      @sku
    end

    def title
      @title
    end

    def productType
      @productType
    end
end

class FileReader
  def ReadFile(m_FilePath)
    array = Array.new
    lines = IO.readlines(m_FilePath)

    lines.each_with_index do |line, i|
      current_row = line.split("\t")
      product = Product.new(current_row[0],current_row[1],current_row[2])

      array.push product
    end
  end
end

filereader_method = FileReader.new.method("ReadFile")
Reading =  filereader_method.to_proc

puts Reading.call("Input.txt")  

要获得分组，可以使用：
Ruby的美妙之处在于你有很多选择。您还可以签出库，因为这只是一个数据对象：
require 'csv'
require 'ostruct'

def products_by_type(file_path)
  csv_opts = { col_sep: "\t",
               headers: true,
               header_converters: [:downcase, :symbol] }

  CSV.open(file_path, csv_opts)
     .map{ |row| OpenStruct.new row.to_hash }
     .group_by{ |product| product.product_type }
end

或者使用基于散列键创建的习惯用法来删除上面对行
上的#to_hash
的调用：
class Product
  attr_accessor :sku, :title, :product_type

  def initialize(data)
    data.each{ |key, value| self.key = value }
  end
end

def products_by_type(file_path)
  csv_opts = { #... }

  CSV.open(file_path, csv_opts)
     .map{ |row| Product.new row }
     .group_by{ |product| product.product_type }
end

然后根据散列，根据需要格式化输出：
def unique_title_words(*products)
  products.flat_map{ |product| product.title.scan(/\w+/) }
          .unique
end

puts "Product Type\tUnique Words"
products_by_type("./file.txt").each do |type, products|
  puts "#{type}\t#{unique_title_words products}"
end

你也可以给出一些示例输出吗？向我们展示你迄今为止尝试过的内容和不起作用的内容。请查看更新的问题
def unique_title_words(*products)
  products.flat_map{ |product| product.title.scan(/\w+/) }
          .unique
end

puts "Product Type\tUnique Words"
products_by_type("./file.txt").each do |type, products|
  puts "#{type}\t#{unique_title_words products}"
end