Ruby 如何获取CSV文件的某些行并将其保存在单个文件中
我有一个包含许多行的CSV文件:Ruby 如何获取CSV文件的某些行并将其保存在单个文件中,ruby,Ruby,我有一个包含许多行的CSV文件: Username,Year,Month,Match (0-60%),Match (60-65%),Match (65-70%),Match (70-75%),Match (75-80%),Match (80-85%),Match (85-90%),Match (90-95%),Match (95-100%),Match (100%),New_total,Edit_total,Review_total Joe,2020,3,52,0,5,2,3,2,0,5,0,
Username,Year,Month,Match (0-60%),Match (60-65%),Match (65-70%),Match (70-75%),Match (75-80%),Match
(80-85%),Match (85-90%),Match (90-95%),Match (95-100%),Match (100%),New_total,Edit_total,Review_total
Joe,2020,3,52,0,5,2,3,2,0,5,0,0,69,142,337
Engineering,2020,3,6469,0,0,0,0,0,0,0,0,0,6469,82,0
User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0
User_TR1_FR_FR,2020,3,66,3,0,0,0,0,0,0,0,2,71,82,0
User_TR1_JA_JP,2020,3,35,49,56,114,0,21,22,66,62,0,425,630,0
User_TR1_KO_KR,2020,3,60,0,0,0,0,0,10,0,0,0,70,0,0
User_TR1_NL_NL,2020,3,61,2,41,59,15,31,11,13,2,0,235,0,0
User_TR1_PL_PL,2020,3,134,17,41,57,15,31,21,13,0,0,329,15,0
User_TR1_PT_BR,2020,3,37,0,2,0,0,12,0,0,0,22,73,53,0
Marie,2020,3,37,15,8,2,0,2,0,5,0,0,69,40,0
Charly,2020,3,224,0,0,0,0,0,0,0,0,0,224,28,0
Joseph,2020,3,56,0,0,0,0,0,0,0,0,0,56,0,0
我只想为行的第一列中包含任何User\u XXX\u XX\u XX
字符串的行创建单独的CSV文件。其他行应该被忽略。最后,所有生成的文件都应该以这些第一个元素中的每个元素命名
例如:
User_TR1_ES_ES.csv
User_TR1_PT_BR.csv
到目前为止,我的代码是:
#!/usr/bin/env ruby
require 'csv'
require 'fileutils'
this_dir = File.expand_path(File.dirname(__FILE__))
original_dir = File.join(this_dir, '_Original')
#working with the .CSV file, there should be only one, and always be .CSV....
puts "Finding a .CSV file..."
full_path = Dir.glob('**/*.csv')
full_path.each do |csv|
puts "CSV file found: #{File.basename(csv)}"
end
new_path = File.join(this_dir, full_path[0])
#I start reading the CSV file found in the folder
parsed_data = CSV.read(new_path)
#I grab the header in a separate variable
header = parsed_data.shift
#I created a constant to look for lines where the first elements meets the desired string, not sure about it...
USER_NAME = 'User' + '_' + 'TR' + 1..10 + ([a-z].upcase * 2) + '_' + ([a-z].upcase * 2)
#I want loop through each line and look for the those that includes the constant in the first element
CSV.foreach(new_path) do |row|
row.first[0].include?(USER_NAME)
#create inviduals files in a same location
new_dir = File.join(this_dir, '_result')
FileUtils.mkdir(new_dir)
newfiles = File.join(new_dir, CONSTANT, '.csv')
CSV.open(newfiles, 'w+') do |csv|
csv << header
csv << captured_row
end
#/usr/bin/env ruby
需要“csv”
需要“fileutils”
this_dir=File.expand_path(File.dirname(uu File_uu))
原始目录=File.join(此目录为“原始目录”)
#使用.CSV文件时,应该只有一个,并且始终是.CSV。。。。
放置“查找.CSV文件…”
完整路径=目录全局('***.csv')
完整路径。每个do | csv|
放置“找到CSV文件:#{file.basename(CSV)}”
结束
新建路径=File.join(此路径,完整路径[0])
#我开始读取文件夹中的CSV文件
解析的_数据=CSV.read(新的_路径)
#我在一个单独的变量中获取标题
header=已解析的_data.shift
#我创建了一个常量来查找第一个元素与所需字符串匹配的行,但不确定它是否匹配。。。
USER_NAME='USER'+''+'TR'+1..10+([a-z].upcase*2)+'.+([a-z].upcase*2)
#我希望循环遍历每一行,并查找在第一个元素中包含常量的元素
CSV.foreach(新路径)do|行|
行。第一个[0]。包括?(用户名)
#在同一位置创建inviduals文件
new_dir=File.join(这个_dir,''u result')
FileUtils.mkdir(新目录)
newfiles=File.join(new_dir,常量'.csv')
CSV.open(新文件“w+”)do | CSV|
csv就我个人而言,我甚至不会麻烦将文件视为csv,因为当您可以轻松获取用户信息时,这并不重要。…
行:
header = nil
DATA.each_line { |l|
if header.nil?
header = l
next
end
fn = l[/^User_TR1_[^,]+/]
next unless fn
File.write(fn + '.csv', header + l)
}
__END__
Username,Year,Month,Match (0-60%),Match (60-65%),Match (65-70%),Match (70-75%),Match (75-80%),Match (80-85%),Match (85-90%),Match (90-95%),Match (95-100%),Match (100%),New_total,Edit_total,Review_total
Joe,2020,3,52,0,5,2,3,2,0,5,0,0,69,142,337
Engineering,2020,3,6469,0,0,0,0,0,0,0,0,0,6469,82,0
User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0
User_TR1_FR_FR,2020,3,66,3,0,0,0,0,0,0,0,2,71,82,0
User_TR1_JA_JP,2020,3,35,49,56,114,0,21,22,66,62,0,425,630,0
User_TR1_KO_KR,2020,3,60,0,0,0,0,0,10,0,0,0,70,0,0
User_TR1_NL_NL,2020,3,61,2,41,59,15,31,11,13,2,0,235,0,0
User_TR1_PL_PL,2020,3,134,17,41,57,15,31,21,13,0,0,329,15,0
User_TR1_PT_BR,2020,3,37,0,2,0,0,12,0,0,0,22,73,53,0
Marie,2020,3,37,15,8,2,0,2,0,5,0,0,69,40,0
Charly,2020,3,224,0,0,0,0,0,0,0,0,0,224,28,0
Joseph,2020,3,56,0,0,0,0,0,0,0,0,0,56,0,0
它创造了:
-rw-r--r--@ 1 TTM staff 256B May 3 17:05 User_TR1_ES_ES.csv
-rw-r--r--@ 1 TTM staff 253B May 3 17:05 User_TR1_FR_FR.csv
-rw-r--r--@ 1 TTM staff 263B May 3 17:05 User_TR1_JA_JP.csv
-rw-r--r--@ 1 TTM staff 253B May 3 17:05 User_TR1_KO_KR.csv
-rw-r--r--@ 1 TTM staff 259B May 3 17:05 User_TR1_NL_NL.csv
-rw-r--r--@ 1 TTM staff 262B May 3 17:05 User_TR1_PL_PL.csv
-rw-r--r--@ 1 TTM staff 255B May 3 17:05 User_TR1_PT_BR.csv
看起来像:
cat User_TR1_ES_ES.csv
Username,Year,Month,Match (0-60%),Match (60-65%),Match (65-70%),Match (70-75%),Match (75-80%),Match (80-85%),Match (85-90%),Match (90-95%),Match (95-100%),Match (100%),New_total,Edit_total,Review_total
User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0
在本例中,我利用了Ruby在代码的\uuuu\uuuu
之后存储数据的能力<代码>数据
由Ruby创建,作为\uuuuu END\uuuuu
之后内容的文件句柄,所以不要注意幕后的那个人
只需使用file.foreach
读取输入文件,抓取第一行作为标题,循环返回并读取下一行。从这一点开始,只需查找与/^User\u TR1.[^,]+/
模式匹配的行
以下是返回的内容:
'Engineering,2020,3,6469,0,0,0,0,0,0,0,0,0,6469,82,0'[/^User_TR1_[^,]+/] # => nil
'User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0'[/^User_TR1_[^,]+/] # => "User_TR1_ES_ES"
因此,如果行不是用户…
linenil
返回,导致代码循环。如果该行是用户…
行,将返回字符串,代码将通过文件输出标题和行。write
此外,如果是我的系统,我会在创建文件名时将其转换为小写。作为一名系统管理员,我学会了避免在文件名中使用大写或混合大写,因为它们有可能拼错文件名
'User_TR1_ES_ES'.downcase + '.csv' # => "user_tr1_es_es.csv"
另外,请参阅我上面关于使用Ruby类的评论。它易于实现,而且非常高效 就我个人而言,我甚至不会费心将文件视为CSV,因为当您可以轻松获取
USER…
行时,这并不重要:
header = nil
DATA.each_line { |l|
if header.nil?
header = l
next
end
fn = l[/^User_TR1_[^,]+/]
next unless fn
File.write(fn + '.csv', header + l)
}
__END__
Username,Year,Month,Match (0-60%),Match (60-65%),Match (65-70%),Match (70-75%),Match (75-80%),Match (80-85%),Match (85-90%),Match (90-95%),Match (95-100%),Match (100%),New_total,Edit_total,Review_total
Joe,2020,3,52,0,5,2,3,2,0,5,0,0,69,142,337
Engineering,2020,3,6469,0,0,0,0,0,0,0,0,0,6469,82,0
User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0
User_TR1_FR_FR,2020,3,66,3,0,0,0,0,0,0,0,2,71,82,0
User_TR1_JA_JP,2020,3,35,49,56,114,0,21,22,66,62,0,425,630,0
User_TR1_KO_KR,2020,3,60,0,0,0,0,0,10,0,0,0,70,0,0
User_TR1_NL_NL,2020,3,61,2,41,59,15,31,11,13,2,0,235,0,0
User_TR1_PL_PL,2020,3,134,17,41,57,15,31,21,13,0,0,329,15,0
User_TR1_PT_BR,2020,3,37,0,2,0,0,12,0,0,0,22,73,53,0
Marie,2020,3,37,15,8,2,0,2,0,5,0,0,69,40,0
Charly,2020,3,224,0,0,0,0,0,0,0,0,0,224,28,0
Joseph,2020,3,56,0,0,0,0,0,0,0,0,0,56,0,0
它创造了:
-rw-r--r--@ 1 TTM staff 256B May 3 17:05 User_TR1_ES_ES.csv
-rw-r--r--@ 1 TTM staff 253B May 3 17:05 User_TR1_FR_FR.csv
-rw-r--r--@ 1 TTM staff 263B May 3 17:05 User_TR1_JA_JP.csv
-rw-r--r--@ 1 TTM staff 253B May 3 17:05 User_TR1_KO_KR.csv
-rw-r--r--@ 1 TTM staff 259B May 3 17:05 User_TR1_NL_NL.csv
-rw-r--r--@ 1 TTM staff 262B May 3 17:05 User_TR1_PL_PL.csv
-rw-r--r--@ 1 TTM staff 255B May 3 17:05 User_TR1_PT_BR.csv
看起来像:
cat User_TR1_ES_ES.csv
Username,Year,Month,Match (0-60%),Match (60-65%),Match (65-70%),Match (70-75%),Match (75-80%),Match (80-85%),Match (85-90%),Match (90-95%),Match (95-100%),Match (100%),New_total,Edit_total,Review_total
User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0
在本例中,我利用了Ruby在代码的\uuuu\uuuu
之后存储数据的能力<代码>数据由Ruby创建,作为\uuuuu END\uuuuu
之后内容的文件句柄,所以不要注意幕后的那个人
只需使用file.foreach
读取输入文件,抓取第一行作为标题,循环返回并读取下一行。从这一点开始,只需查找与/^User\u TR1.[^,]+/
模式匹配的行
以下是返回的内容:
'Engineering,2020,3,6469,0,0,0,0,0,0,0,0,0,6469,82,0'[/^User_TR1_[^,]+/] # => nil
'User_TR1_ES_ES,2020,3,112,3,0,0,0,14,10,0,0,2,141,3,0'[/^User_TR1_[^,]+/] # => "User_TR1_ES_ES"
因此,如果行不是用户…
linenil
返回,导致代码循环。如果该行是用户…
行,将返回字符串,代码将通过文件输出标题和行。write
此外,如果是我的系统,我会在创建文件名时将其转换为小写。作为一名系统管理员,我学会了避免在文件名中使用大写或混合大写,因为它们有可能拼错文件名
'User_TR1_ES_ES'.downcase + '.csv' # => "user_tr1_es_es.csv"
另外,请参阅我上面关于使用Ruby类的评论。它易于实现,而且非常高效 您可以试试这段代码,它甚至可以处理大型csv文件
header = nil
CSV.open(path_to_your_csv_file, 'r', :headers => true).each do |line|
header = line unless header
if line[0].match? /User_TR\d_[a-zA-Z]{2}_[a-zA-z]{2}/
file_name = "#{line[0].downcase}.csv"
existed = File.exists? file_name
CSV.open(file_name, 'a+') do |row|
row << header unless existed
row << line
end
end
end
header=nil
打开(指向您的CSV文件“r”的路径:headers=>true)|
页眉=行,除非页眉
如果行[0],是否匹配/用户\u TR\d\u[a-zA-Z]{2}\u[a-zA-Z]{2}/
文件_name=“#{line[0].downcase}.csv”
existed=File.exists?文件名
CSV.open(文件名“a+”)do行|
row您可以尝试此代码,即使使用大型csv文件也可以使用
header = nil
CSV.open(path_to_your_csv_file, 'r', :headers => true).each do |line|
header = line unless header
if line[0].match? /User_TR\d_[a-zA-Z]{2}_[a-zA-z]{2}/
file_name = "#{line[0].downcase}.csv"
existed = File.exists? file_name
CSV.open(file_name, 'a+') do |row|
row << header unless existed
row << line
end
end
end
header=nil
打开(指向您的CSV文件“r”的路径:headers=>true)|
页眉=行,除非页眉
如果行[0],是否匹配/用户\u TR\d\u[a-zA-Z]{2}\u[a-zA-Z]{2}/
文件_name=“#{line[0].downcase}.csv”
existed=File.exists?文件名
CSV.open(文件名“a+”)do行|
行让我们首先创建一个CSV文件:
CSV_In = 't.csv'
请参阅和(又名shift
)和[CSV::行标题](
现在,我们可以使用foreach
打开CSV文件,并逐步遍历行,在适当的位置写入输出文件
CSV.foreach(CSV_In, headers: true) do |row|
user_name = row['Username']
CSV.open("#{user_name}.csv", "w") {|csv| csv << headers << row} if
user_name.match?(/\AUser_TR1(?:_[A-Z]{2}){2}\z/)
end
可以根据需要修改正则表达式。例如,一种可能性是:
require 'csv'
csv = CSV.open(CSV_In, headers:true)
headers = csv.gets.headers
#=> ["Username", "Year", "Month", "Match (0-60%)"]
csv.close
/\AUser_[A-Z]{2}\d(?:_[A-Z]{2}){2}\z/
使用CSV
方法(与IO
和File
方法相反)读取输入文件的一个优点是,如果以后列的顺序发生更改,则无需更改代码。让我们先创建一个CSV文件:
CSV_In = 't.csv'
请参阅和(又名shift
)和[CSV::行标题](
现在,我们可以使用foreach
打开CSV文件,并逐步遍历行,在适当的位置写入输出文件
CSV.foreach(CSV_In, headers: true) do |row|
user_name = row['Username']
CSV.open("#{user_name}.csv", "w") {|csv| csv << headers << row} if
user_name.match?(/\AUser_TR1(?:_[A-Z]{2}){2}\z/)
end
可以根据需要修改正则表达式。例如,一种可能性是:
require 'csv'
csv = CSV.open(CSV_In, headers:true)
headers = csv.gets.headers
#=> ["Username", "Year", "Month", "Match (0-60%)"]
csv.close
/\AUser_[A-Z]{2}\d(?:_[A-Z]{2}){2}\z/
使用CSV
方法的一个优点