Ruby：解析CSV和循环行时出现问题_Ruby_Regex_Parsing_Csv

Ruby：解析CSV和循环行时出现问题

ruby regex parsing csv

Ruby：解析CSV和循环行时出现问题,ruby,regex,parsing,csv,Ruby,Regex,Parsing,Csv,我有一个带有多个文件名和日期的CSV： "doc_1.doc", "date1" "doc_2.doc", "date2" "doc_5.doc", "date5" 问题在于，文件编号之间存在许多间隙，例如：doc_2和doc_5 我正在尝试编写一个脚本，通过比较每一行并在必要时填充空白来解析CSV并填充空白 e、 g.在本例中，它将添加 "doc_3.doc", "date copied from date2" "doc_4.doc", "date copied from date2" 我

我有一个带有多个文件名和日期的CSV：

"doc_1.doc", "date1"
"doc_2.doc", "date2"
"doc_5.doc", "date5"

问题在于，文件编号之间存在许多间隙，例如：

doc_2

和

doc_5

我正在尝试编写一个脚本，通过比较每一行并在必要时填充空白来解析CSV并填充空白

e、 g.在本例中，它将添加

"doc_3.doc", "date copied from date2"
"doc_4.doc", "date copied from date2"

我试图用Ruby编写这个脚本，因为我正在学习Ruby语言，显然我误解了Ruby循环的工作方式，因为它不是PHP中经常使用的典型“for”循环

这是我到目前为止的代码，对循环本身的任何帮助都将不胜感激

#!/usr/bin/env ruby

require 'csv'

# Load file
csv_fname = './upload-list-docs.csv'

# Parsing function
def parse_csv(csv_fname)
    uploads = []
    last_number = 0

    # Regex to find number in doc_XXX.YYY
    regex_find_number = /(?<=\_)(.*?)(?=\.)/

    csv_content = CSV.read(csv_fname)

    # Skip header row
    csv_content.shift

    csv_content.each do |row|
        current_number = row[0].match regex_find_number
        current_date = row[1]
        last_date = current_date

        until last_number == current_number do
            uploads << [last_number, last_date]
            last_number += 1
        end
    end

    return uploads
end

puts parse_csv(csv_fname)

问题不在于循环（除了危险的

，如上所述，应该将其更改为

），而在于从正则表达式匹配中提取整数

current_number = row[0].match( regex_find_number )[0].to_i

下面是我将如何编写代码：

require 'csv'
csv_fname = './upload-list-docs.csv'

# Create a structure to get some easy methods:
Myfile = Struct.new(:name,:date){
  def number
    name[/(?<=\_)(.*?)(?=\.)/].to_i
  end
  def next_file
    Myfile.new(name.gsub(/(?<=\_)(.*?)(?=\.)/){|num|num.next}, date)
  end
}

# Read the content and add it to and array:
content = CSV.read(csv_fname)[1..-1].map{|data| Myfile.new(*data)}

# Add first entry to an result array:
result = [content.shift]

until content.empty?

 # Get new file:
 new_file = content.shift

 # Fill up with new files until we hit next file:
 files_between = new_file.number - result.last.number
 unless files_between == 1
   (files_between - 1).times do
     result << result.last.next_file
   end
 end

 # Add next file:
 result << new_file

end

# Map result back to array:
result.map!(&:to_a)

需要“csv”
csv_fname='/上传列表文档.csv'
#创建一个结构以获得一些简单的方法：
Myfile=Struct.new（：name，：date）{
def编号
name[/（？一种面向对象的方法。请注意，我这样做是因为我认为您需要用[doc\u X.doc，date]
来填充空格，而不是[X，date]
——对于@file\u name
，这种方法更合适，因为它需要更多的正则表达式。现在可能有点冗长，但它仍然可以工作，并且可读性很强
require 'csv'

class Upload

  attr_reader :file_number, :date

  def initialize(file_name_or_number, date)
    @date = date
    @file_number = if file_name_or_number.is_a?(String)
                     file_name_or_number[/_(\d+)\./, 1].to_i
                   else
                     file_name_or_number
                   end
  end

  def to_a
    [@file_number, @date]
  end
end

class UploadCollection

  attr_reader :uploads

  def initialize(input_file)
    # Slice off all but the first element
    input_data = CSV.read(input_file)[1..-1] 
    # Create an array of Upload objects and sort by file number
    @uploads = input_data
                  .map { |row| Upload.new(row[0], row[1]) }
                  .sort_by(&:file_number)
  end

  def fill_blanks!
    # Get the smallest and largest file number
    # (they're sorted this way, remember)
    min, max = @uploads.first.file_number, @uploads.last.file_number
    # Create an array of all numbers between min and max, and
    # remove those elements already representing a file number
    missing = (min..max).to_a - @uploads.map(&:file_number)
    missing.each do |num|
      # Explaining how this works makes my head ache.  Check out the
      # docs for Array#insert.
      @uploads.insert(num - 1, Upload.new(num, @uploads[num-2].date))
    end

    # Non-ambiguous return value
    true
  end

  def to_a
    @uploads.map(&:to_a)
  end

  def write_csv(file_path)
    CSV.open(file_path, 'wb') do |csv|
      csv << ['file_number', 'date'] # Headers
      to_a.each { |u| csv << u }
    end
  end
end

file = 'fnames.csv'
collection = UploadCollection.new(file)
collection.fill_blanks!
puts collection.to_a
collection.write_csv('out.csv')

需要“csv”
类上载
属性读取器：文件编号：日期
def初始化（文件名或编号、日期）
@日期=日期
@file\u number=如果file\u name\u或\u number.是？（字符串）
文件名或文件编号[/\u（\d+）\./，1]。发送至
其他的
文件名或编号
结束
结束
def to_a
[@文件编号，@日期]
结束
结束
类上载集合
属性读取器：上载
def初始化（输入_文件）
#切掉除第一个元素以外的所有元素
input_data=CSV.read（输入_文件）[1..-1]
#创建上载对象数组并按文件编号排序
@上传=输入数据
.map{|行|上载.new（行[0]，行[1]））
.排序依据（&：文件编号）
结束
def填补空白！
#获取最小和最大的文件号
#（记住，它们是这样分类的）
最小值，最大值=@uploads.first.file\u number，@uploads.last.file\u number
#创建一个包含最小值和最大值之间所有数字的数组，然后
#删除已表示文件号的元素
缺少=（min..max）.to_a-@uploads.map（&：文件编号）
缺少。每个do | num|
#解释它的工作原理让我头疼。看看
#数组#插入的文档。
@uploads.insert（num-1，Upload.new（num，@uploads[num-2].date））
结束
#非歧义返回值
真的
结束
def to_a
@uploads.map（&:to_a）
结束
def write_csv（文件路径）
CSV.open（文件路径，'wb'）do | CSV|
csv当你运行这个代码时会发生什么？你会得到一个无止境的循环，对吗？是的，无止境的循环，因为“当前的\u编号”永远不会改变。要做的第一件事是：禁用无止境的内循环，将更改为直到最后一个\u编号>=当前的\u编号做
…这会给你一个线索。@waffl当前的\u编号
自以来不必改变e> 最后一个_编号
正在更改。只要其中一个正在更改（并且越来越接近终止条件），这就可以完美地工作，您能解释我如何使输出成为输出吗？（正则表达式的结果，而不是前缀和后缀？）我正在尝试修改下一个_文件构造的行Myfile.new（name.gsub（/（？不确定您的确切意思。可能不使用string.match（regexp）
而使用string[regexp]
。它将返回一个字符串而不是匹配对象。如果这不是您想要的，您需要重新表述您的问题。嗯，我希望它不是doc_1.jpg，date
，而是1，date
Ok，所以在我的版本中它将是：Myfile.new（name[/（？不幸的是，这似乎不起作用，因为我相信问题确实在循环中。每个
函数都没有按顺序迭代CSV，我想这就是为什么循环是无限的。这非常有效，我只是将def row[@file_name，@date]end
更改为def row[@file_number，@date]结束
以获取仅包含数字的列表。我现在唯一的最后一个问题是如何将数组输出到CSV文件？哇，很简单。谢谢：）事实上，我注意到这似乎只是在空白处插入了一条记录：例如，20,21,25,26
变成了20,21,22,25,26
你当然是对的。我不确定解决这个问题有多简单。我已经将它与另一个答案结合起来，非常接近解决方案，唯一的问题是我现在得到了，例如，从：7,8,12
到：7,8,9,9,12，我想这是由于文件号未更新所致：
require 'csv'

class Upload

  attr_reader :file_number, :date

  def initialize(file_name_or_number, date)
    @date = date
    @file_number = if file_name_or_number.is_a?(String)
                     file_name_or_number[/_(\d+)\./, 1].to_i
                   else
                     file_name_or_number
                   end
  end

  def to_a
    [@file_number, @date]
  end
end

class UploadCollection

  attr_reader :uploads

  def initialize(input_file)
    # Slice off all but the first element
    input_data = CSV.read(input_file)[1..-1] 
    # Create an array of Upload objects and sort by file number
    @uploads = input_data
                  .map { |row| Upload.new(row[0], row[1]) }
                  .sort_by(&:file_number)
  end

  def fill_blanks!
    # Get the smallest and largest file number
    # (they're sorted this way, remember)
    min, max = @uploads.first.file_number, @uploads.last.file_number
    # Create an array of all numbers between min and max, and
    # remove those elements already representing a file number
    missing = (min..max).to_a - @uploads.map(&:file_number)
    missing.each do |num|
      # Explaining how this works makes my head ache.  Check out the
      # docs for Array#insert.
      @uploads.insert(num - 1, Upload.new(num, @uploads[num-2].date))
    end

    # Non-ambiguous return value
    true
  end

  def to_a
    @uploads.map(&:to_a)
  end

  def write_csv(file_path)
    CSV.open(file_path, 'wb') do |csv|
      csv << ['file_number', 'date'] # Headers
      to_a.each { |u| csv << u }
    end
  end
end

file = 'fnames.csv'
collection = UploadCollection.new(file)
collection.fill_blanks!
puts collection.to_a
collection.write_csv('out.csv')