Arrays 查询CSV:：表以在纯旧ruby脚本中查找两个给定日期之间销售额最高的商品_Arrays_Ruby_Enumerable

Arrays 查询CSV:：表以在纯旧ruby脚本中查找两个给定日期之间销售额最高的商品

arrays ruby

Arrays 查询CSV:：表以在纯旧ruby脚本中查找两个给定日期之间销售额最高的商品,arrays,ruby,enumerable,Arrays,Ruby,Enumerable,我正在努力寻找两个给定日期之间的最高销售额 require 'csv' require 'date' # get directory of the current file LIB_DIR = File.dirname(__FILE__) # get the absolute path of the ad_report & product_report CSV # and set to a var AD_CSV_PATH = File.expand_path('data/ad_rep

我正在努力寻找两个给定日期之间的最高销售额

require 'csv'
require 'date'

# get directory of the current file
LIB_DIR = File.dirname(__FILE__)

# get the absolute path of the ad_report & product_report CSV
# and set to a var
AD_CSV_PATH = File.expand_path('data/ad_report.csv', LIB_DIR)
PROD_CSV_PATH = File.expand_path('data/product_report.csv', LIB_DIR)

# create CSV::Table for ad-ad_report and product_report CSV
ad_report_table = CSV.parse(File.read(AD_CSV_PATH), headers: true)
prod_report_table = CSV.parse(File.read(PROD_CSV_PATH), headers: true)

## finds the row with the highest sales
sales_row = ad_report_table.max_by { |row| row[3].to_i }
At this point I can get the row that has the greatest sale, and all the data from that row, but it is not in the excepted range.

Below I am trying to use range with the preset dates.

## range of date for items between
first_date = Date.new(2017, 05, 02)
last_date = Date.new(2017, 05, 31)
range = (first_date...last_date)

puts sales_row

这是我的ad_report.csv文件的标题：

date,impressions,clicks,sales,ad_spend,keyword_id,asin
2017-06-19,4451,1006,608,24.87,UVOLBWHILJ,63N02JK10S
2017-06-18,5283,3237,1233,85.06,UVOLBWHILJ,63N02JK10S
2017-06-17,0,0,0,21.77,UVOLBWHILJ,63N02JK10S
...

下面是我所有的工作代码，它返回具有最高值的行，但不是在给定日期之间

require 'csv'
require 'date'

# get directory of the current file
LIB_DIR = File.dirname(__FILE__)

# get the absolute path of the ad_report & product_report CSV
# and set to a var
AD_CSV_PATH = File.expand_path('data/ad_report.csv', LIB_DIR)
PROD_CSV_PATH = File.expand_path('data/product_report.csv', LIB_DIR)

# create CSV::Table for ad-ad_report and product_report CSV
ad_report_table = CSV.parse(File.read(AD_CSV_PATH), headers: true)
prod_report_table = CSV.parse(File.read(PROD_CSV_PATH), headers: true)

## finds the row with the highest sales
sales_row = ad_report_table.max_by { |row| row[3].to_i }
At this point I can get the row that has the greatest sale, and all the data from that row, but it is not in the excepted range.

Below I am trying to use range with the preset dates.

## range of date for items between
first_date = Date.new(2017, 05, 02)
last_date = Date.new(2017, 05, 31)
range = (first_date...last_date)

puts sales_row

下面是我觉得应该做的sudo代码，但可能有更好的方法

## check for highest sales
## return sales if between date
## else reject col if 
## loop this until it returns date between
## return result

您可以创建一个包含两个日期的范围，然后使用

range#cover？

测试该日期是否在以下范围内：

range = Date.new(2015-01-01)..Date.new(2020-01-01)
rows.select do |row|
  range.cover?(Date.parse(row[1]))
end.max_by { |row| row[3].to_i }

尽管Tin Man完全正确，您应该使用数据库来代替。

您可以获得如下所示的所需值。我假设感兴趣的字段（

'sales'

）表示整数值。如果没有，请将下面的

更改为\u i

更改为

。更改为\u f

代码

require 'csv'

def greatest(fname, max_field, date_field, date_range)
  largest = nil
  CSV.foreach(fname, headers:true) do |csv|
    largest = { row: csv.to_a, value: csv[max_field].to_i } if
      date_range.cover?(csv[date_field]) &&
      (largest.nil? || csv[max_field].to_i > largest[:value])
  end
  largest.nil? ? nil : largest[:row].to_h
end

示例

让我们首先创建一个CSV文件

str =<<~END
date,impressions,clicks,sales,ad_spend,keyword_id,asin
2017-06-19,4451,1006,608,24.87,UVOLBWHILJ,63N02JK10S
2017-06-18,5283,3237,1233,85.06,UVOLBWHILJ,63N02JK10S
2017-06-17,0,0,0,21.77,UVOLBWHILJ,63N02JK10S
2017-06-20,4451,1006,200000,24.87,UVOLBWHILJ,63N02JK10S
END

现在查找给定日期范围内“销售额”值最大的记录

greatest(fname, 'sales', 'date', '2017-06-17'..'2017-06-19')    
  #=> {"date"=>"2017-06-18", "impressions"=>"5283", "clicks"=>"3237",
  #    "sales"=>"1233", "ad_spend"=>"85.06", "keyword_id"=>"UVOLBWHILJ",
  #    "asin"=>"63N02JK10S"} 
greatest(fname, 'sales', 'date', '2017-06-17'..'2017-06-25')
  #=> {"date"=>"2017-06-20", "impressions"=>"4451", "clicks"=>"1006",
  #   "sales"=>"200000", "ad_spend"=>"24.87", "keyword_id"=>"UVOLBWHILJ",
  #   "asin"=>"63N02JK10S"}
greatest(fname, 'sales', 'date', '2017-06-22'..'2017-06-25')    
  #=> nil

我逐行读取文件（使用）以将内存需求保持在最低限度，如果文件很大，这可能是必要的

请注意，由于日期是“yyyy-mm-dd”格式，因此无需将两个日期转换为

date

对象来比较它们；也就是说，它们可以作为字符串进行比较（例如

'2017-06-17'true

）。

CSV不能替代数据库。相反，它是一种用于在电子表格或数据库之间传输数据的文件格式。搜索或读取CSV文件以生成报告是确定任何类型信息的缓慢方式。最终，一旦CSV读取太大而无法放入内存，它可能会导致问题。我建议研究DBM，因为即使是SQLite也会使这项任务更快、更简单。使用ORM也会使你的代码更加灵活；我推荐但是YMMV。欢迎来到SO！我看不出你在哪里试图与你的约会范围进行比较。请试着写下来，如果你有问题，问一个问题。既然如此，你的问题还为时过早。“”和“”帮助解释过程。@theTinMan，一般来说，我不能不同意，但可能存在这样的情况：CSV文件是给定的，由其他人生成，并且只能读取一次，在这种情况下，将其读入数据库然后提取感兴趣的信息可能没有意义。此外，在这种情况下，不需要将CSV文件塞进内存；它可以逐行读取。@CarySwoveland可能是在报告或总结的情况下，文件可以立即扔掉，总结可以传递，但这是我多年来从未遇到过的情况。将生成摘要，而不是在CVS数据库转储中传递。如果数据来自组织/公司内部，则可以从DBM内部访问数据。数据库的速度快得多，可以在很短的时间内完成相同的查找。这听起来像是一个XY问题的案例。。。“它走得很慢。可以我想完成这个解决方案，然后我将尝试使用数据库。谢谢你期待什么？您正在解析人类已知的最糟糕的格式，并在Ruby中遍历一个巨大的对象。当然慢了。