Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/21.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby:解析CSV和循环行时出现问题_Ruby_Regex_Parsing_Csv - Fatal编程技术网

Ruby:解析CSV和循环行时出现问题

Ruby:解析CSV和循环行时出现问题,ruby,regex,parsing,csv,Ruby,Regex,Parsing,Csv,我有一个带有多个文件名和日期的CSV: "doc_1.doc", "date1" "doc_2.doc", "date2" "doc_5.doc", "date5" 问题在于,文件编号之间存在许多间隙,例如:doc_2和doc_5 我正在尝试编写一个脚本,通过比较每一行并在必要时填充空白来解析CSV并填充空白 e、 g.在本例中,它将添加 "doc_3.doc", "date copied from date2" "doc_4.doc", "date copied from date2" 我

我有一个带有多个文件名和日期的CSV:

"doc_1.doc", "date1"
"doc_2.doc", "date2"
"doc_5.doc", "date5"
问题在于,文件编号之间存在许多间隙,例如:
doc_2
doc_5

我正在尝试编写一个脚本,通过比较每一行并在必要时填充空白来解析CSV并填充空白

e、 g.在本例中,它将添加

"doc_3.doc", "date copied from date2"
"doc_4.doc", "date copied from date2"
我试图用Ruby编写这个脚本,因为我正在学习Ruby语言,显然我误解了Ruby循环的工作方式,因为它不是PHP中经常使用的典型“for”循环

这是我到目前为止的代码,对循环本身的任何帮助都将不胜感激

#!/usr/bin/env ruby

require 'csv'

# Load file
csv_fname = './upload-list-docs.csv'

# Parsing function
def parse_csv(csv_fname)
    uploads = []
    last_number = 0

    # Regex to find number in doc_XXX.YYY
    regex_find_number = /(?<=\_)(.*?)(?=\.)/

    csv_content = CSV.read(csv_fname)

    # Skip header row
    csv_content.shift

    csv_content.each do |row|
        current_number = row[0].match regex_find_number
        current_date = row[1]
        last_date = current_date

        until last_number == current_number do
            uploads << [last_number, last_date]
            last_number += 1
        end
    end

    return uploads
end

puts parse_csv(csv_fname)

问题不在于循环(除了危险的
=
,如上所述,应该将其更改为
=
),而在于从正则表达式匹配中提取整数

current_number = row[0].match( regex_find_number )[0].to_i

下面是我将如何编写代码:

require 'csv'
csv_fname = './upload-list-docs.csv'

# Create a structure to get some easy methods:
Myfile = Struct.new(:name,:date){
  def number
    name[/(?<=\_)(.*?)(?=\.)/].to_i
  end
  def next_file
    Myfile.new(name.gsub(/(?<=\_)(.*?)(?=\.)/){|num|num.next}, date)
  end
}

# Read the content and add it to and array:
content = CSV.read(csv_fname)[1..-1].map{|data| Myfile.new(*data)}

# Add first entry to an result array:
result = [content.shift]

until content.empty?

 # Get new file:
 new_file = content.shift

 # Fill up with new files until we hit next file:
 files_between = new_file.number - result.last.number
 unless files_between == 1
   (files_between - 1).times do
     result << result.last.next_file
   end
 end

 # Add next file:
 result << new_file

end

# Map result back to array:
result.map!(&:to_a)
需要“csv”
csv_fname='/上传列表文档.csv'
#创建一个结构以获得一些简单的方法:
Myfile=Struct.new(:name,:date){
def编号

name[/(?一种面向对象的方法。请注意,我这样做是因为我认为您需要用
[doc\u X.doc,date]
来填充空格,而不是
[X,date]
——对于
@file\u name
,这种方法更合适,因为它需要更多的正则表达式。现在可能有点冗长,但它仍然可以工作,并且可读性很强

require 'csv'

class Upload

  attr_reader :file_number, :date

  def initialize(file_name_or_number, date)
    @date = date
    @file_number = if file_name_or_number.is_a?(String)
                     file_name_or_number[/_(\d+)\./, 1].to_i
                   else
                     file_name_or_number
                   end
  end

  def to_a
    [@file_number, @date]
  end
end

class UploadCollection

  attr_reader :uploads

  def initialize(input_file)
    # Slice off all but the first element
    input_data = CSV.read(input_file)[1..-1] 
    # Create an array of Upload objects and sort by file number
    @uploads = input_data
                  .map { |row| Upload.new(row[0], row[1]) }
                  .sort_by(&:file_number)
  end

  def fill_blanks!
    # Get the smallest and largest file number
    # (they're sorted this way, remember)
    min, max = @uploads.first.file_number, @uploads.last.file_number
    # Create an array of all numbers between min and max, and
    # remove those elements already representing a file number
    missing = (min..max).to_a - @uploads.map(&:file_number)
    missing.each do |num|
      # Explaining how this works makes my head ache.  Check out the
      # docs for Array#insert.
      @uploads.insert(num - 1, Upload.new(num, @uploads[num-2].date))
    end

    # Non-ambiguous return value
    true
  end

  def to_a
    @uploads.map(&:to_a)
  end

  def write_csv(file_path)
    CSV.open(file_path, 'wb') do |csv|
      csv << ['file_number', 'date'] # Headers
      to_a.each { |u| csv << u }
    end
  end
end

file = 'fnames.csv'
collection = UploadCollection.new(file)
collection.fill_blanks!
puts collection.to_a
collection.write_csv('out.csv')
需要“csv”
类上载
属性读取器:文件编号:日期
def初始化(文件名或编号、日期)
@日期=日期
@file\u number=如果file\u name\u或\u number.是?(字符串)
文件名或文件编号[/\u(\d+)\./,1]。发送至
其他的
文件名或编号
结束
结束
def to_a
[@文件编号,@日期]
结束
结束
类上载集合
属性读取器:上载
def初始化(输入_文件)
#切掉除第一个元素以外的所有元素
input_data=CSV.read(输入_文件)[1..-1]
#创建上载对象数组并按文件编号排序
@上传=输入数据
.map{|行|上载.new(行[0],行[1]))
.排序依据(&:文件编号)
结束
def填补空白!
#获取最小和最大的文件号
#(记住,它们是这样分类的)
最小值,最大值=@uploads.first.file\u number,@uploads.last.file\u number
#创建一个包含最小值和最大值之间所有数字的数组,然后
#删除已表示文件号的元素
缺少=(min..max).to_a-@uploads.map(&:文件编号)
缺少。每个do | num|
#解释它的工作原理让我头疼。看看
#数组#插入的文档。
@uploads.insert(num-1,Upload.new(num,@uploads[num-2].date))
结束
#非歧义返回值
真的
结束
def to_a
@uploads.map(&:to_a)
结束
def write_csv(文件路径)
CSV.open(文件路径,'wb')do | CSV|

csv当你运行这个代码时会发生什么?你会得到一个无止境的循环,对吗?是的,无止境的循环,因为“当前的\u编号”永远不会改变。要做的第一件事是:禁用无止境的内循环,将更改为
直到最后一个\u编号>=当前的\u编号做
…这会给你一个线索。@waffl
当前的\u编号
以来不必改变e> 最后一个_编号
正在更改。只要其中一个正在更改(并且越来越接近终止条件),这就可以完美地工作,您能解释我如何使输出成为输出吗?(正则表达式的结果,而不是前缀和后缀?)我正在尝试修改下一个_文件构造的行
Myfile.new(name.gsub(/(?不确定您的确切意思。可能不使用
string.match(regexp)
而使用
string[regexp]
。它将返回一个字符串而不是匹配对象。如果这不是您想要的,您需要重新表述您的问题。嗯,我希望它不是
doc_1.jpg,date
,而是
1,date
Ok,所以在我的版本中它将是:
Myfile.new(name[/(?不幸的是,这似乎不起作用,因为我相信问题确实在循环中。
每个
函数都没有按顺序迭代CSV,我想这就是为什么循环是无限的。这非常有效,我只是将
def row[@file_name,@date]end
更改为
def row[@file_number,@date]结束
以获取仅包含数字的列表。我现在唯一的最后一个问题是如何将数组输出到CSV文件?哇,很简单。谢谢:)事实上,我注意到这似乎只是在空白处插入了一条记录:例如,
20,21,25,26
变成了
20,21,22,25,26
你当然是对的。我不确定解决这个问题有多简单。我已经将它与另一个答案结合起来,非常接近解决方案,唯一的问题是我现在得到了,例如,从:
7,8,12
到:
7,8,9,9,12
,我想这是由于文件号未更新所致:
require 'csv'

class Upload

  attr_reader :file_number, :date

  def initialize(file_name_or_number, date)
    @date = date
    @file_number = if file_name_or_number.is_a?(String)
                     file_name_or_number[/_(\d+)\./, 1].to_i
                   else
                     file_name_or_number
                   end
  end

  def to_a
    [@file_number, @date]
  end
end

class UploadCollection

  attr_reader :uploads

  def initialize(input_file)
    # Slice off all but the first element
    input_data = CSV.read(input_file)[1..-1] 
    # Create an array of Upload objects and sort by file number
    @uploads = input_data
                  .map { |row| Upload.new(row[0], row[1]) }
                  .sort_by(&:file_number)
  end

  def fill_blanks!
    # Get the smallest and largest file number
    # (they're sorted this way, remember)
    min, max = @uploads.first.file_number, @uploads.last.file_number
    # Create an array of all numbers between min and max, and
    # remove those elements already representing a file number
    missing = (min..max).to_a - @uploads.map(&:file_number)
    missing.each do |num|
      # Explaining how this works makes my head ache.  Check out the
      # docs for Array#insert.
      @uploads.insert(num - 1, Upload.new(num, @uploads[num-2].date))
    end

    # Non-ambiguous return value
    true
  end

  def to_a
    @uploads.map(&:to_a)
  end

  def write_csv(file_path)
    CSV.open(file_path, 'wb') do |csv|
      csv << ['file_number', 'date'] # Headers
      to_a.each { |u| csv << u }
    end
  end
end

file = 'fnames.csv'
collection = UploadCollection.new(file)
collection.fill_blanks!
puts collection.to_a
collection.write_csv('out.csv')