Ruby 如何获取CSV文件的某些行并将其保存在单个文件中

Ruby 如何获取CSV文件的某些行并将其保存在单个文件中,ruby,Ruby,我有一个包含许多行的CSV文件: Username,Year,Month,Match (0-60%),Match (60-65%),Match (65-70%),Match (70-75%),Match (75-80%),Match (80-85%),Match (85-90%),Match (90-95%),Match (95-100%),Match (100%),New_total,Edit_total,Review_total Joe,2020,3,52,0,5,2,3,2,0,5,0,

我有一个包含许多行的CSV文件:

Username,Year,Month,Match (0-60%),Match (60-65%),Match (65-70%),Match (70-75%),Match (75-80%),Match 
(80-85%),Match (85-90%),Match (90-95%),Match (95-100%),Match (100%),New_total,Edit_total,Review_total
Joe,2020,3,52,0,5,2,3,2,0,5,0,0,69,142,337
Engineering,2020,3,6469,0,0,0,0,0,0,0,0,0,6469,82,0
User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0
User_TR1_FR_FR,2020,3,66,3,0,0,0,0,0,0,0,2,71,82,0
User_TR1_JA_JP,2020,3,35,49,56,114,0,21,22,66,62,0,425,630,0
User_TR1_KO_KR,2020,3,60,0,0,0,0,0,10,0,0,0,70,0,0
User_TR1_NL_NL,2020,3,61,2,41,59,15,31,11,13,2,0,235,0,0
User_TR1_PL_PL,2020,3,134,17,41,57,15,31,21,13,0,0,329,15,0
User_TR1_PT_BR,2020,3,37,0,2,0,0,12,0,0,0,22,73,53,0
Marie,2020,3,37,15,8,2,0,2,0,5,0,0,69,40,0
Charly,2020,3,224,0,0,0,0,0,0,0,0,0,224,28,0
Joseph,2020,3,56,0,0,0,0,0,0,0,0,0,56,0,0
我只想为行的第一列中包含任何
User\u XXX\u XX\u XX
字符串的行创建单独的CSV文件。其他行应该被忽略。最后,所有生成的文件都应该以这些第一个元素中的每个元素命名

例如:

User_TR1_ES_ES.csv
User_TR1_PT_BR.csv
到目前为止,我的代码是:

#!/usr/bin/env ruby

require 'csv'
require 'fileutils'

this_dir = File.expand_path(File.dirname(__FILE__))
original_dir = File.join(this_dir, '_Original')

#working with the .CSV file, there should be only one, and always be .CSV....
puts "Finding a .CSV file..."
full_path = Dir.glob('**/*.csv')
full_path.each do |csv|
  puts "CSV file found: #{File.basename(csv)}"
end

new_path = File.join(this_dir, full_path[0])

#I start reading the CSV file found in the folder
parsed_data = CSV.read(new_path)

#I grab the header in a separate variable
header = parsed_data.shift

#I created a constant to look for lines where the first elements meets the desired string, not sure about it...
USER_NAME = 'User' + '_' + 'TR' + 1..10 + ([a-z].upcase * 2) + '_' + ([a-z].upcase * 2)

#I want loop through each line and look for the those that includes the constant in the first element
CSV.foreach(new_path) do |row|
  row.first[0].include?(USER_NAME)

  #create inviduals files in a same location
  new_dir = File.join(this_dir, '_result')
  FileUtils.mkdir(new_dir)
  newfiles = File.join(new_dir, CONSTANT, '.csv')

  CSV.open(newfiles, 'w+') do |csv|
    csv << header
    csv << captured_row
  end
#/usr/bin/env ruby
需要“csv”
需要“fileutils”
this_dir=File.expand_path(File.dirname(uu File_uu))
原始目录=File.join(此目录为“原始目录”)
#使用.CSV文件时,应该只有一个,并且始终是.CSV。。。。
放置“查找.CSV文件…”
完整路径=目录全局('***.csv')
完整路径。每个do | csv|
放置“找到CSV文件:#{file.basename(CSV)}”
结束
新建路径=File.join(此路径,完整路径[0])
#我开始读取文件夹中的CSV文件
解析的_数据=CSV.read(新的_路径)
#我在一个单独的变量中获取标题
header=已解析的_data.shift
#我创建了一个常量来查找第一个元素与所需字符串匹配的行,但不确定它是否匹配。。。
USER_NAME='USER'+''+'TR'+1..10+([a-z].upcase*2)+'.+([a-z].upcase*2)
#我希望循环遍历每一行,并查找在第一个元素中包含常量的元素
CSV.foreach(新路径)do|行|
行。第一个[0]。包括?(用户名)
#在同一位置创建inviduals文件
new_dir=File.join(这个_dir,''u result')
FileUtils.mkdir(新目录)
newfiles=File.join(new_dir,常量'.csv')
CSV.open(新文件“w+”)do | CSV|

csv就我个人而言,我甚至不会麻烦将文件视为csv,因为当您可以轻松获取
用户信息时,这并不重要。…
行:

header = nil

DATA.each_line { |l|
  if header.nil?
    header = l
    next
  end

  fn = l[/^User_TR1_[^,]+/]
  next unless fn

  File.write(fn + '.csv', header + l)
}

__END__
Username,Year,Month,Match (0-60%),Match (60-65%),Match (65-70%),Match (70-75%),Match (75-80%),Match (80-85%),Match (85-90%),Match (90-95%),Match (95-100%),Match (100%),New_total,Edit_total,Review_total
Joe,2020,3,52,0,5,2,3,2,0,5,0,0,69,142,337
Engineering,2020,3,6469,0,0,0,0,0,0,0,0,0,6469,82,0
User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0
User_TR1_FR_FR,2020,3,66,3,0,0,0,0,0,0,0,2,71,82,0
User_TR1_JA_JP,2020,3,35,49,56,114,0,21,22,66,62,0,425,630,0
User_TR1_KO_KR,2020,3,60,0,0,0,0,0,10,0,0,0,70,0,0
User_TR1_NL_NL,2020,3,61,2,41,59,15,31,11,13,2,0,235,0,0
User_TR1_PL_PL,2020,3,134,17,41,57,15,31,21,13,0,0,329,15,0
User_TR1_PT_BR,2020,3,37,0,2,0,0,12,0,0,0,22,73,53,0
Marie,2020,3,37,15,8,2,0,2,0,5,0,0,69,40,0
Charly,2020,3,224,0,0,0,0,0,0,0,0,0,224,28,0
Joseph,2020,3,56,0,0,0,0,0,0,0,0,0,56,0,0
它创造了:

-rw-r--r--@  1 TTM  staff   256B May  3 17:05 User_TR1_ES_ES.csv
-rw-r--r--@  1 TTM  staff   253B May  3 17:05 User_TR1_FR_FR.csv
-rw-r--r--@  1 TTM  staff   263B May  3 17:05 User_TR1_JA_JP.csv
-rw-r--r--@  1 TTM  staff   253B May  3 17:05 User_TR1_KO_KR.csv
-rw-r--r--@  1 TTM  staff   259B May  3 17:05 User_TR1_NL_NL.csv
-rw-r--r--@  1 TTM  staff   262B May  3 17:05 User_TR1_PL_PL.csv
-rw-r--r--@  1 TTM  staff   255B May  3 17:05 User_TR1_PT_BR.csv
看起来像:

cat User_TR1_ES_ES.csv

Username,Year,Month,Match (0-60%),Match (60-65%),Match (65-70%),Match (70-75%),Match (75-80%),Match (80-85%),Match (85-90%),Match (90-95%),Match (95-100%),Match (100%),New_total,Edit_total,Review_total
User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0
在本例中,我利用了Ruby在代码的
\uuuu\uuuu
之后存储数据的能力<代码>数据
由Ruby创建,作为
\uuuuu END\uuuuu
之后内容的文件句柄,所以不要注意幕后的那个人

只需使用
file.foreach
读取输入文件,抓取第一行作为标题,循环返回并读取下一行。从这一点开始,只需查找与
/^User\u TR1.[^,]+/
模式匹配的行

以下是返回的内容:

'Engineering,2020,3,6469,0,0,0,0,0,0,0,0,0,6469,82,0'[/^User_TR1_[^,]+/] # => nil
'User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0'[/^User_TR1_[^,]+/] # => "User_TR1_ES_ES"
因此,如果行不是
用户…
line
nil
返回,导致代码循环。如果该行是
用户…
行,将返回字符串,代码将通过
文件输出标题和行。write

此外,如果是我的系统,我会在创建文件名时将其转换为小写。作为一名系统管理员,我学会了避免在文件名中使用大写或混合大写,因为它们有可能拼错文件名

'User_TR1_ES_ES'.downcase + '.csv' # => "user_tr1_es_es.csv"

另外,请参阅我上面关于使用Ruby类的评论。它易于实现,而且非常高效

就我个人而言,我甚至不会费心将文件视为CSV,因为当您可以轻松获取
USER…
行时,这并不重要:

header = nil

DATA.each_line { |l|
  if header.nil?
    header = l
    next
  end

  fn = l[/^User_TR1_[^,]+/]
  next unless fn

  File.write(fn + '.csv', header + l)
}

__END__
Username,Year,Month,Match (0-60%),Match (60-65%),Match (65-70%),Match (70-75%),Match (75-80%),Match (80-85%),Match (85-90%),Match (90-95%),Match (95-100%),Match (100%),New_total,Edit_total,Review_total
Joe,2020,3,52,0,5,2,3,2,0,5,0,0,69,142,337
Engineering,2020,3,6469,0,0,0,0,0,0,0,0,0,6469,82,0
User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0
User_TR1_FR_FR,2020,3,66,3,0,0,0,0,0,0,0,2,71,82,0
User_TR1_JA_JP,2020,3,35,49,56,114,0,21,22,66,62,0,425,630,0
User_TR1_KO_KR,2020,3,60,0,0,0,0,0,10,0,0,0,70,0,0
User_TR1_NL_NL,2020,3,61,2,41,59,15,31,11,13,2,0,235,0,0
User_TR1_PL_PL,2020,3,134,17,41,57,15,31,21,13,0,0,329,15,0
User_TR1_PT_BR,2020,3,37,0,2,0,0,12,0,0,0,22,73,53,0
Marie,2020,3,37,15,8,2,0,2,0,5,0,0,69,40,0
Charly,2020,3,224,0,0,0,0,0,0,0,0,0,224,28,0
Joseph,2020,3,56,0,0,0,0,0,0,0,0,0,56,0,0
它创造了:

-rw-r--r--@  1 TTM  staff   256B May  3 17:05 User_TR1_ES_ES.csv
-rw-r--r--@  1 TTM  staff   253B May  3 17:05 User_TR1_FR_FR.csv
-rw-r--r--@  1 TTM  staff   263B May  3 17:05 User_TR1_JA_JP.csv
-rw-r--r--@  1 TTM  staff   253B May  3 17:05 User_TR1_KO_KR.csv
-rw-r--r--@  1 TTM  staff   259B May  3 17:05 User_TR1_NL_NL.csv
-rw-r--r--@  1 TTM  staff   262B May  3 17:05 User_TR1_PL_PL.csv
-rw-r--r--@  1 TTM  staff   255B May  3 17:05 User_TR1_PT_BR.csv
看起来像:

cat User_TR1_ES_ES.csv

Username,Year,Month,Match (0-60%),Match (60-65%),Match (65-70%),Match (70-75%),Match (75-80%),Match (80-85%),Match (85-90%),Match (90-95%),Match (95-100%),Match (100%),New_total,Edit_total,Review_total
User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0
在本例中,我利用了Ruby在代码的
\uuuu\uuuu
之后存储数据的能力<代码>数据由Ruby创建,作为
\uuuuu END\uuuuu
之后内容的文件句柄,所以不要注意幕后的那个人

只需使用
file.foreach
读取输入文件,抓取第一行作为标题,循环返回并读取下一行。从这一点开始,只需查找与
/^User\u TR1.[^,]+/
模式匹配的行

以下是返回的内容:

'Engineering,2020,3,6469,0,0,0,0,0,0,0,0,0,6469,82,0'[/^User_TR1_[^,]+/] # => nil
'User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0'[/^User_TR1_[^,]+/] # => "User_TR1_ES_ES"
因此,如果行不是
用户…
line
nil
返回,导致代码循环。如果该行是
用户…
行,将返回字符串,代码将通过
文件输出标题和行。write

此外,如果是我的系统,我会在创建文件名时将其转换为小写。作为一名系统管理员,我学会了避免在文件名中使用大写或混合大写,因为它们有可能拼错文件名

'User_TR1_ES_ES'.downcase + '.csv' # => "user_tr1_es_es.csv"

另外,请参阅我上面关于使用Ruby类的评论。它易于实现,而且非常高效

您可以试试这段代码,它甚至可以处理大型csv文件

header = nil
CSV.open(path_to_your_csv_file, 'r', :headers => true).each do |line|
  header = line unless header
  if line[0].match? /User_TR\d_[a-zA-Z]{2}_[a-zA-z]{2}/
    file_name = "#{line[0].downcase}.csv"
    existed = File.exists? file_name
    CSV.open(file_name, 'a+') do |row|
      row << header unless existed
      row << line
    end
  end
end
header=nil
打开(指向您的CSV文件“r”的路径:headers=>true)|
页眉=行,除非页眉
如果行[0],是否匹配/用户\u TR\d\u[a-zA-Z]{2}\u[a-zA-Z]{2}/
文件_name=“#{line[0].downcase}.csv”
existed=File.exists?文件名
CSV.open(文件名“a+”)do行|

row您可以尝试此代码,即使使用大型csv文件也可以使用

header = nil
CSV.open(path_to_your_csv_file, 'r', :headers => true).each do |line|
  header = line unless header
  if line[0].match? /User_TR\d_[a-zA-Z]{2}_[a-zA-z]{2}/
    file_name = "#{line[0].downcase}.csv"
    existed = File.exists? file_name
    CSV.open(file_name, 'a+') do |row|
      row << header unless existed
      row << line
    end
  end
end
header=nil
打开(指向您的CSV文件“r”的路径:headers=>true)|
页眉=行,除非页眉
如果行[0],是否匹配/用户\u TR\d\u[a-zA-Z]{2}\u[a-zA-Z]{2}/
文件_name=“#{line[0].downcase}.csv”
existed=File.exists?文件名
CSV.open(文件名“a+”)do行|

行让我们首先创建一个CSV文件:

CSV_In = 't.csv'

请参阅和(又名
shift
)和[CSV::行标题](

现在,我们可以使用
foreach
打开CSV文件,并逐步遍历行,在适当的位置写入输出文件

CSV.foreach(CSV_In, headers: true) do |row|
  user_name = row['Username']
  CSV.open("#{user_name}.csv", "w") {|csv| csv << headers << row} if
    user_name.match?(/\AUser_TR1(?:_[A-Z]{2}){2}\z/)
end  
可以根据需要修改正则表达式。例如,一种可能性是:

require 'csv'

csv = CSV.open(CSV_In, headers:true)
headers = csv.gets.headers
  #=> ["Username", "Year", "Month", "Match (0-60%)"] 
csv.close
/\AUser_[A-Z]{2}\d(?:_[A-Z]{2}){2}\z/

使用
CSV
方法(与
IO
File
方法相反)读取输入文件的一个优点是,如果以后列的顺序发生更改,则无需更改代码。

让我们先创建一个CSV文件:

CSV_In = 't.csv'

请参阅和(又名
shift
)和[CSV::行标题](

现在,我们可以使用
foreach
打开CSV文件,并逐步遍历行,在适当的位置写入输出文件

CSV.foreach(CSV_In, headers: true) do |row|
  user_name = row['Username']
  CSV.open("#{user_name}.csv", "w") {|csv| csv << headers << row} if
    user_name.match?(/\AUser_TR1(?:_[A-Z]{2}){2}\z/)
end  
可以根据需要修改正则表达式。例如,一种可能性是:

require 'csv'

csv = CSV.open(CSV_In, headers:true)
headers = csv.gets.headers
  #=> ["Username", "Year", "Month", "Match (0-60%)"] 
csv.close
/\AUser_[A-Z]{2}\d(?:_[A-Z]{2}){2}\z/
使用
CSV
方法的一个优点