Ruby on rails 具有哈希性能的Ruby外观数组
现在我面临这个问题 例如,我有这个散列数组Ruby on rails 具有哈希性能的Ruby外观数组,ruby-on-rails,arrays,ruby,performance,hash,Ruby On Rails,Arrays,Ruby,Performance,Hash,现在我面临这个问题 例如,我有这个散列数组 data = [ {:id => 1,:start_date => "2015-01-02",:end_date => "2015-01-05"}, {:id => 2,:start_date => "2015-01-06",:end_date => "2015-01-07"}, {:id => 3,:start_date => "2015-01-10",:end_date => "20
data = [
{:id => 1,:start_date => "2015-01-02",:end_date => "2015-01-05"},
{:id => 2,:start_date => "2015-01-06",:end_date => "2015-01-07"},
{:id => 3,:start_date => "2015-01-10",:end_date => "2015-01-20"}
]
所以我想找到在上述散列的开始日期和结束日期范围内具有“2015-01-04”的确切散列
按照文档进行操作,我发现有3种方法可以做到这一点
1) 使用选择
finding_hash = data.select {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
这样做,finding_散列就是我需要的结果散列
3) 传统回路
data.each do |t|
if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
return t
break
end
end
data.each do|t|
如果(t[:开始日期]=“2015-01-04”)
返回t
打破
结束
结束
那么,哪种方法最快呢?我确实需要性能,因为我的数据非常大 谢谢你,很抱歉我的英语不好 v3是最快的:
def v1
@data.select {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end
def v2
@data.find{|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end
def v3
@data.each do |t|
if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
return t
break
end
end
end
在示例数据上运行它与在真实数据上运行它不同
如果实际数据太大,您可以在数据的子集上运行它,以获得更好的答案
顺便说一句,您可以将v3改写为:
data.each do |t|
break t if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
end
data.each do|t|
如果(t[:开始日期]=“2015-01-04”)中断t
结束
FWIW,在阵列上操作将会非常笨拙和缓慢。您可能希望将其保存在数据库中并运行查询。对于大型数据集,这可能至少快2个数量级。您可以通过
例如:
require 'benchmark'
n = 1000000
data = [
{:id => 1,:start_date => "2015-01-02",:end_date => "2015-01-05"},
{:id => 2,:start_date => "2015-01-06",:end_date => "2015-01-07"},
{:id => 3,:start_date => "2015-01-10",:end_date => "2015-01-20"}
]
Benchmark.bm do |x|
x.report { n.times do
data.select {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end
}
x.report { n.times do
data.find{|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
end
}
x.report {
n.times do
finding_hash = {}
data.each do |t|
if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
finding_hash = t
break
end
end
end
}
end
测试结果与n值和数据大小有关。您尝试过的所有方法都是
可枚举的方法,但是本机数组方法更快。尝试即使必须单独调用以加载哈希,它仍然比下一个快20%左右:
index = data.find_index {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
x = data[index]
所有这些变体都是O(n)复杂度。
如果您的范围没有重叠,您可以使用复杂度为O(log n)的数组的b搜索
。你应该首先对你的范围进行分类
sorted = data.sort_by { |x| x[:start_date] }
sorted.bsearch { |x| ..check if range of `x` includes value.. }
如果您的数据非常大,那么您应该将其放入数据库并对其进行索引。即使是SQLite也可能会吃掉这样的东西。假设数组中的哈希值按日期排序是否安全?@spickermann:不,这是随机的,我的朋友,所以你有三段代码,你想知道哪一段最快。你为什么不测量性能呢?@SergioTulentsev:对不起,我不知道正确的方法,所以我添加了两个新的值start_time和end_time。在每个代码的末尾,我给出了end_time-start_time,但效果不太好…@DuongBach:不要发表“谢谢”的评论。向上投票是最好的感谢(如果你真的认为它有用的话)@Sergio Tulentsev,对不起,我发现了错误,我会的correct@SergioTulentsev:好:),
user system total real
1.490000 0.020000 1.510000 ( 1.533589)
1.070000 0.010000 1.080000 ( 1.096578)
1.000000 0.010000 1.010000 ( 1.011021)
index = data.find_index {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
x = data[index]
n = 1_000_000
data = [
{:id => 1,:start_date => "2015-01-02",:end_date => "2015-01-05"},
{:id => 2,:start_date => "2015-01-06",:end_date => "2015-01-07"},
{:id => 3,:start_date => "2015-01-10",:end_date => "2015-01-20"}
]
Benchmark.bm do |x|
x.report 'Enumerable#select' do
n.times do
data.select do |h|
h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"
end
end
end
x.report 'Enumerable#detect' do
n.times do
data.detect do |h|
h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"
end
end
end
x.report 'Enumerable#each ' do
n.times do
finding_hash = {}
data.each do |t|
if (t[:start_date] <= "2015-01-04" && t[:end_date] >= "2015-01-04")
finding_hash = t
break t
end
end
end
end
x.report 'Array#find_index ' do
n.times do
index = data.find_index {|h| h[:start_date] <= "2015-01-04" && h[:end_date] >= "2015-01-04"}
x = data[index]
end
end
end
Enumerable#select 1.000000 0.010000 1.010000 ( 1.002282)
Enumerable#detect 0.790000 0.000000 0.790000 ( 0.797319)
Enumerable#each 0.620000 0.000000 0.620000 ( 0.627272)
Array#find_index 0.520000 0.000000 0.520000 ( 0.515691)
sorted = data.sort_by { |x| x[:start_date] }
sorted.bsearch { |x| ..check if range of `x` includes value.. }