Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/23.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby 更快的CSV+;试图找到独特的项目_Ruby_File Io_Unique_Fastercsv - Fatal编程技术网

Ruby 更快的CSV+;试图找到独特的项目

Ruby 更快的CSV+;试图找到独特的项目,ruby,file-io,unique,fastercsv,Ruby,File Io,Unique,Fastercsv,我有一个csv文件,我试图找到第2列中的所有uniq值,其中第1列具有相同的值,并将其合并到一个新的csv文件中。我知道,这听起来很让人困惑,所以这里有一个例子: 原始文件foo.csv的示例: "Boom Lifts","Model Number","Manufacturer","Platform Height","Horizontal Outreach","Lift Capacity" "Boom Lifts","Model Number","Platform Height","Horizo

我有一个csv文件,我试图找到第2列中的所有uniq值,其中第1列具有相同的值,并将其合并到一个新的csv文件中。我知道,这听起来很让人困惑,所以这里有一个例子:

原始文件foo.csv的示例:

"Boom Lifts","Model Number","Manufacturer","Platform Height","Horizontal Outreach","Lift Capacity"
"Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height","Platform Capacity"
"Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height"
"Pusharound Lifts","Model Number","Manufacturer","Platform Height","Stowed Height"
"Scissor Lifts","Model Number","Manufacturer","Platform Height","Stowed Height","Overall Dimensions","Platform Extension"
"Scissor Lifts","Overall Dimensions","Platform Size","Platform Extension","Lift Capacity"
理想结果栏.csv:

"Boom Lifts","Model Number","Manufacturer","Platform Height","Horizontal Outreach","Lift Capacity","Up & Over Height","Platform Capacity",,,
"Pusharound Lifts","Model Number","Manufacturer","Platform Height","Stowed Height"
"Scissor Lifts","Model Number","Manufacturer","Platform Height","Stowed Height","Overall Dimensions","Platform Size","Platform Extension","Lift Capacity"
每一行都有不同的长度,而且它是一个相当大的文件(超过5k行),我完全不知道如何进行匹配/字符串操作。是的,有些行后面有逗号,其中有“空单元格”。我一直在使用更快的CSV,所以如果有一种方法可以做到这一点,那就太好了


指针?最好是一些不会让我的mbp突然停止的东西?

假设您可以将其放入具有更快CSV的2d阵列中:

a = [
  ["Boom Lifts","Model Number","Manufacturer","Platform Height","Horizontal Outreach","Lift Capacity"]
  ["Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height","Platform Capacity"]
  ["Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height"]
  ["Pusharound Lifts","Model Number","Manufacturer","Platform Height","Stowed Height"]
  ["Scissor Lifts","Model Number","Manufacturer","Platform Height","Stowed Height","Overall Dimensions","Platform Extension"]
  ["Scissor Lifts","Overall Dimensions","Platform Size","Platform Extension","Lift Capacity"]
]

a.group_by {|e| e[0]}.map {|e| e.flatten.uniq}
让你:

[
  ["Boom Lifts", "Model Number", "Manufacturer", "Platform Height", "Horizontal Outreach", "Lift Capacity", "Up & Over Height", "Platform Capacity"]
  ["Pusharound Lifts", "Model Number", "Manufacturer", "Platform Height", "Stowed Height"]
  ["Scissor Lifts", "Model Number", "Manufacturer", "Platform Height", "Stowed Height", "Overall Dimensions", "Platform Extension", "Platform Size", "Lift Capacity"]
]

不会是即时的,但不应降低MBP。

因此,a)第一列可以被视为键,b)所有后续列可以被视为列表中的值,最后您希望此列表包含唯一的值。。。?bar.csv中的最后一行重复“外形尺寸”和“平台扩展”。重复的值是否正常?我的错误、所有维度和平台扩展不应重复。我想用fasterCSV读入一个文件foo.csv,然后吐出另一个bar.csv。谢谢