Ruby:解析CSV和循环行时出现问题
我有一个带有多个文件名和日期的CSV:Ruby:解析CSV和循环行时出现问题,ruby,regex,parsing,csv,Ruby,Regex,Parsing,Csv,我有一个带有多个文件名和日期的CSV: "doc_1.doc", "date1" "doc_2.doc", "date2" "doc_5.doc", "date5" 问题在于,文件编号之间存在许多间隙,例如:doc_2和doc_5 我正在尝试编写一个脚本,通过比较每一行并在必要时填充空白来解析CSV并填充空白 e、 g.在本例中,它将添加 "doc_3.doc", "date copied from date2" "doc_4.doc", "date copied from date2" 我
"doc_1.doc", "date1"
"doc_2.doc", "date2"
"doc_5.doc", "date5"
问题在于,文件编号之间存在许多间隙,例如:doc_2
和doc_5
我正在尝试编写一个脚本,通过比较每一行并在必要时填充空白来解析CSV并填充空白
e、 g.在本例中,它将添加
"doc_3.doc", "date copied from date2"
"doc_4.doc", "date copied from date2"
我试图用Ruby编写这个脚本,因为我正在学习Ruby语言,显然我误解了Ruby循环的工作方式,因为它不是PHP中经常使用的典型“for”循环
这是我到目前为止的代码,对循环本身的任何帮助都将不胜感激
#!/usr/bin/env ruby
require 'csv'
# Load file
csv_fname = './upload-list-docs.csv'
# Parsing function
def parse_csv(csv_fname)
uploads = []
last_number = 0
# Regex to find number in doc_XXX.YYY
regex_find_number = /(?<=\_)(.*?)(?=\.)/
csv_content = CSV.read(csv_fname)
# Skip header row
csv_content.shift
csv_content.each do |row|
current_number = row[0].match regex_find_number
current_date = row[1]
last_date = current_date
until last_number == current_number do
uploads << [last_number, last_date]
last_number += 1
end
end
return uploads
end
puts parse_csv(csv_fname)
问题不在于循环(除了危险的
=
,如上所述,应该将其更改为=
),而在于从正则表达式匹配中提取整数
current_number = row[0].match( regex_find_number )[0].to_i
下面是我将如何编写代码:
require 'csv'
csv_fname = './upload-list-docs.csv'
# Create a structure to get some easy methods:
Myfile = Struct.new(:name,:date){
def number
name[/(?<=\_)(.*?)(?=\.)/].to_i
end
def next_file
Myfile.new(name.gsub(/(?<=\_)(.*?)(?=\.)/){|num|num.next}, date)
end
}
# Read the content and add it to and array:
content = CSV.read(csv_fname)[1..-1].map{|data| Myfile.new(*data)}
# Add first entry to an result array:
result = [content.shift]
until content.empty?
# Get new file:
new_file = content.shift
# Fill up with new files until we hit next file:
files_between = new_file.number - result.last.number
unless files_between == 1
(files_between - 1).times do
result << result.last.next_file
end
end
# Add next file:
result << new_file
end
# Map result back to array:
result.map!(&:to_a)
需要“csv”
csv_fname='/上传列表文档.csv'
#创建一个结构以获得一些简单的方法:
Myfile=Struct.new(:name,:date){
def编号
name[/(?一种面向对象的方法。请注意,我这样做是因为我认为您需要用[doc\u X.doc,date]
来填充空格,而不是[X,date]
——对于@file\u name
,这种方法更合适,因为它需要更多的正则表达式。现在可能有点冗长,但它仍然可以工作,并且可读性很强
require 'csv'
class Upload
attr_reader :file_number, :date
def initialize(file_name_or_number, date)
@date = date
@file_number = if file_name_or_number.is_a?(String)
file_name_or_number[/_(\d+)\./, 1].to_i
else
file_name_or_number
end
end
def to_a
[@file_number, @date]
end
end
class UploadCollection
attr_reader :uploads
def initialize(input_file)
# Slice off all but the first element
input_data = CSV.read(input_file)[1..-1]
# Create an array of Upload objects and sort by file number
@uploads = input_data
.map { |row| Upload.new(row[0], row[1]) }
.sort_by(&:file_number)
end
def fill_blanks!
# Get the smallest and largest file number
# (they're sorted this way, remember)
min, max = @uploads.first.file_number, @uploads.last.file_number
# Create an array of all numbers between min and max, and
# remove those elements already representing a file number
missing = (min..max).to_a - @uploads.map(&:file_number)
missing.each do |num|
# Explaining how this works makes my head ache. Check out the
# docs for Array#insert.
@uploads.insert(num - 1, Upload.new(num, @uploads[num-2].date))
end
# Non-ambiguous return value
true
end
def to_a
@uploads.map(&:to_a)
end
def write_csv(file_path)
CSV.open(file_path, 'wb') do |csv|
csv << ['file_number', 'date'] # Headers
to_a.each { |u| csv << u }
end
end
end
file = 'fnames.csv'
collection = UploadCollection.new(file)
collection.fill_blanks!
puts collection.to_a
collection.write_csv('out.csv')
需要“csv”
类上载
属性读取器:文件编号:日期
def初始化(文件名或编号、日期)
@日期=日期
@file\u number=如果file\u name\u或\u number.是?(字符串)
文件名或文件编号[/\u(\d+)\./,1]。发送至
其他的
文件名或编号
结束
结束
def to_a
[@文件编号,@日期]
结束
结束
类上载集合
属性读取器:上载
def初始化(输入_文件)
#切掉除第一个元素以外的所有元素
input_data=CSV.read(输入_文件)[1..-1]
#创建上载对象数组并按文件编号排序
@上传=输入数据
.map{|行|上载.new(行[0],行[1]))
.排序依据(&:文件编号)
结束
def填补空白!
#获取最小和最大的文件号
#(记住,它们是这样分类的)
最小值,最大值=@uploads.first.file\u number,@uploads.last.file\u number
#创建一个包含最小值和最大值之间所有数字的数组,然后
#删除已表示文件号的元素
缺少=(min..max).to_a-@uploads.map(&:文件编号)
缺少。每个do | num|
#解释它的工作原理让我头疼。看看
#数组#插入的文档。
@uploads.insert(num-1,Upload.new(num,@uploads[num-2].date))
结束
#非歧义返回值
真的
结束
def to_a
@uploads.map(&:to_a)
结束
def write_csv(文件路径)
CSV.open(文件路径,'wb')do | CSV|
csv当你运行这个代码时会发生什么?你会得到一个无止境的循环,对吗?是的,无止境的循环,因为“当前的\u编号”永远不会改变。要做的第一件事是:禁用无止境的内循环,将更改为直到最后一个\u编号>=当前的\u编号做
…这会给你一个线索。@waffl当前的\u编号
自以来不必改变e> 最后一个_编号
正在更改。只要其中一个正在更改(并且越来越接近终止条件),这就可以完美地工作,您能解释我如何使输出成为输出吗?(正则表达式的结果,而不是前缀和后缀?)我正在尝试修改下一个_文件构造的行Myfile.new(name.gsub(/(?不确定您的确切意思。可能不使用string.match(regexp)
而使用string[regexp]
。它将返回一个字符串而不是匹配对象。如果这不是您想要的,您需要重新表述您的问题。嗯,我希望它不是doc_1.jpg,date
,而是1,date
Ok,所以在我的版本中它将是:Myfile.new(name[/(?不幸的是,这似乎不起作用,因为我相信问题确实在循环中。每个
函数都没有按顺序迭代CSV,我想这就是为什么循环是无限的。这非常有效,我只是将def row[@file_name,@date]end
更改为def row[@file_number,@date]结束
以获取仅包含数字的列表。我现在唯一的最后一个问题是如何将数组输出到CSV文件?哇,很简单。谢谢:)事实上,我注意到这似乎只是在空白处插入了一条记录:例如,20,21,25,26
变成了20,21,22,25,26
你当然是对的。我不确定解决这个问题有多简单。我已经将它与另一个答案结合起来,非常接近解决方案,唯一的问题是我现在得到了,例如,从:7,8,12
到:7,8,9,9,12
,我想这是由于文件号未更新所致:
require 'csv'
class Upload
attr_reader :file_number, :date
def initialize(file_name_or_number, date)
@date = date
@file_number = if file_name_or_number.is_a?(String)
file_name_or_number[/_(\d+)\./, 1].to_i
else
file_name_or_number
end
end
def to_a
[@file_number, @date]
end
end
class UploadCollection
attr_reader :uploads
def initialize(input_file)
# Slice off all but the first element
input_data = CSV.read(input_file)[1..-1]
# Create an array of Upload objects and sort by file number
@uploads = input_data
.map { |row| Upload.new(row[0], row[1]) }
.sort_by(&:file_number)
end
def fill_blanks!
# Get the smallest and largest file number
# (they're sorted this way, remember)
min, max = @uploads.first.file_number, @uploads.last.file_number
# Create an array of all numbers between min and max, and
# remove those elements already representing a file number
missing = (min..max).to_a - @uploads.map(&:file_number)
missing.each do |num|
# Explaining how this works makes my head ache. Check out the
# docs for Array#insert.
@uploads.insert(num - 1, Upload.new(num, @uploads[num-2].date))
end
# Non-ambiguous return value
true
end
def to_a
@uploads.map(&:to_a)
end
def write_csv(file_path)
CSV.open(file_path, 'wb') do |csv|
csv << ['file_number', 'date'] # Headers
to_a.each { |u| csv << u }
end
end
end
file = 'fnames.csv'
collection = UploadCollection.new(file)
collection.fill_blanks!
puts collection.to_a
collection.write_csv('out.csv')