Ruby 按日期范围对大型哈希数组进行分组_Ruby_Arrays_Hash

Ruby 按日期范围对大型哈希数组进行分组

ruby arrays hash

Ruby 按日期范围对大型哈希数组进行分组,ruby,arrays,hash,Ruby,Arrays,Hash,我有一个大的散列数组（~5MB），需要按滚动日期范围对其进行分组下面是一个Ruby方法，它将数组转换为我要查找的滚动数据集： def rolling(options = {}) rolling_items = [] options[:date_range].each do |day| start_date = rolling_start_date(day) end_date = day range = start_date..end_date ne

我有一个大的散列数组（~5MB），需要按滚动日期范围对其进行分组

下面是一个Ruby方法，它将数组转换为我要查找的滚动数据集：

def rolling(options = {})
  rolling_items = []

  options[:date_range].each do |day|
    start_date = rolling_start_date(day)
    end_date = day

    range = start_date..end_date

    new_items = options[:data].select{|key, value| range.cover? Date.parse(key[:created].to_s)}.uniq { |h| h[:customer] }

    amount = new_items.count


    rolling_items.push({created: day, amount: amount})
  end

  rolling_items
end

这将调用一个

rolling\u start\u date

方法，该方法占用给定的一天，并给出它的开始日期：

def rolling_start_date(end_date)
  old = Time.utc(end_date.year, end_date.month, end_date.day)
  previous = old - 1.month

  if old.day > previous.day
     start_date = previous + 1.day
  else
     start_date = old - 1.month + 1.day
  end

  start_date.to_date
end

我将其称为

rolling

方法：

rolling（日期范围：date.current.start\u of_day-1.year..date.current.end\u of_day，数据：customers）

下面是一组客户。用于上述通话中的数据。。

因此，

rolling

方法然后在整个

date\u范围内的每一天循环，并找到它的rolling\u start\u日期
，在这种情况下，在新的日期范围内查找哈希值，统计唯一的客户数，并将其推送到新的滚动\u items
数组中，因此我最终得到一个如下所示的数组：
[
   {:created=>Fri, 21 Feb 2014, :amount=>2711}, 
   {:created=>Sat, 22 Feb 2014, :amount=>2716}, 
   {:created=>Sun, 23 Feb 2014, :amount=>2720}, 
   {:created=>Mon, 24 Feb 2014, :amount=>2731}, 
   {:created=>Tue, 25 Feb 2014, :amount=>2746}, 
   {:created=>Wed, 26 Feb 2014, :amount=>2761}, 
   {:created=>Thu, 27 Feb 2014, :amount=>2765}, 
   {:created=>Fri, 28 Feb 2014, :amount=>2754}, 
   ...
]

…其中，每个哈希是日期范围内唯一客户的总数
因此，我需要弄清楚如何获得每个滚动日期范围的唯一客户计数，而不必在5MB阵列上循环365次。
也许我不理解其目的，但您能否只迭代一次客户
阵列，并确定每个客户计数的天数范围？如果我理解正确，这个范围总是一个月，所以我可以简单地说，在2013年2月1日创建了计划的客户X将在2月1日到2月28日之间的所有日子中添加一个唯一的客户，对吗？也就是说，考虑到我们还没有将每个客户（唯一客户）计算在内，所有这些天里，每个客户只“生成”+1。再说一次，也许我没有正确地理解你，但如果我刚才说的是真的，你可以这样做：
rolling_items = {}

customers.each do |customer|
  start_date = Date.parse(customer[:created])
  end_date = start_date + 30
  (start_date..end_date).each do |date|
    # Add empty Hash with default value 0 if date was not yet in Hash.
    # Add 1 for the customer, so we can see duplicates if we want
    (rolling_items[date] ||= Hash.new(0))[customer[:customer]] += 1
  end
end

rolling_items.each do |date, customers|
  uniq_customers = customers.keys.size # Hash keys are already unique, just count
  puts "\n%s => %s unique customers" % [date.strftime, uniq_customers]
  puts "-" * 20
  customers.each do |customer, times|
    puts "%s => %d" % [customer, times]
  end
end

# 2013-02-28 => 7 unique customers
# --------------------
# cus_05eOKvdnc3MkJO => 2
# cus_0e7LBxIfqSyLAP => 2
# cus_05HVTILpv7CuVS => 2
# cus_1CD4BnX3jDcA3g => 2
# cus_0G9GwU25yAT0ih => 1
# cus_1BqrfANA13SoNc => 3
# cus_0S12vFMb8r6ef1 => 2

# 2013-03-01 ... etc

顺便说一句，那里有很多重复的客户条目，日期相同，我不确定这是不是有意的。我拿了你的巨型阵列的前14项。
这是一个有趣且陈述得很好的问题。也许你可以破除行new_items=options…
以避免水平滚动。我的理解正确吗{:created=>Fri，2014年2月21日，：amount=>2711}，
意味着你在上个月左右增加了2711名客户？如果是这样的话，那么让每个散列改为该日期的客户总数，然后根据需要计算差异以获得滚动值如何？