Ruby 检查两个哈希是否具有相同的键集_Ruby_Hash

Ruby 检查两个哈希是否具有相同的键集

ruby hash

Ruby 检查两个哈希是否具有相同的键集,ruby,hash,Ruby,Hash,检查两个哈希值h1和h2是否具有相同的键集（不考虑顺序）的最有效方法是什么？它能比我发布的答案更快或更简洁吗？这是我的尝试： (h1.keys - h2.keys).empty? and (h2.keys - h1.keys).empty? 尝试：更糟糕的情况是，您只需对键进行一次迭代。组合和想法：这取决于你的数据实际上没有一般情况。例如，通常一次检索整个密钥集比单独检查每个密钥的包含情况要快。但是，如果在数据集中，键集之间的差异更大，那么失败更快的较慢解决方案可能会更快。例如： h1

检查两个哈希值

h1

和

h2

是否具有相同的键集（不考虑顺序）的最有效方法是什么？它能比我发布的答案更快或更简洁吗？

这是我的尝试：

(h1.keys - h2.keys).empty? and (h2.keys - h1.keys).empty?

尝试：

更糟糕的情况是，您只需对键进行一次迭代。

组合和想法：

这取决于你的数据

实际上没有一般情况。例如，通常一次检索整个密钥集比单独检查每个密钥的包含情况要快。但是，如果在数据集中，键集之间的差异更大，那么失败更快的较慢解决方案可能会更快。例如：

h1.size == h2.size and h1.keys.all?{|k|h2.include?(k)}

另一个要考虑的因素是散列的大小。如果它们很大，那么一个安装成本较高的解决方案，如调用

Set.new

，可能会有回报，但如果它们很小，则不会：

h1.size == h2.size and Set.new(h1.keys) == Set.new(h2.keys)

如果您碰巧一次又一次地比较相同的不可变哈希，那么缓存结果肯定会有回报

最终只有一个基准才能说明问题，但是，要编写基准，我们需要更多地了解您的用例。当然，使用合成数据（例如，随机生成的密钥）测试解决方案将不具有代表性。

只是为了在这个问题上至少有一个基准

require 'securerandom'
require 'benchmark'

a = {}
b = {}

# Use uuid to get a unique random key
(0..1_000).each do |i|
  key = SecureRandom.uuid
  a[key] = i
  b[key] = i
end

Benchmark.bmbm do |x|
  x.report("#-") do
    1_000.times do
      (a.keys - b.keys).empty? and (a.keys - b.keys).empty?
    end
  end

  x.report("#&") do
    1_000.times do
      computed = a.keys & b.keys
      computed.size == a.size
    end
  end

  x.report("#all?") do
    1_000.times do
      a.keys.all?{ |key| !!b[key] }
    end
  end

  x.report("#sort") do
    1_000.times do
      a_sorted = a.keys.sort
      b_sorted = b.keys.sort
      a == b
    end
  end
end

结果如下：

Rehearsal -----------------------------------------
#-      1.000000   0.000000   1.000000 (  1.001348)
#&      0.560000   0.000000   0.560000 (  0.563523)
#all?   0.240000   0.000000   0.240000 (  0.239058)
#sort   0.850000   0.010000   0.860000 (  0.854839)
-------------------------------- total: 2.660000sec

            user     system      total        real
#-      0.980000   0.000000   0.980000 (  0.976698)
#&      0.560000   0.000000   0.560000 (  0.559592)
#all?   0.250000   0.000000   0.250000 (  0.251128)
#sort   0.860000   0.000000   0.860000 (  0.862857)

我同意@akuhn的观点，如果我们有更多关于您使用的数据集的信息，这将是一个更好的基准。但话虽如此，我相信这个问题确实需要一些确凿的事实。

好吧，让我们打破所有关于生活和便携性的规则。MRI的C API开始发挥作用

/* Name this file superhash.c. An appropriate Makefile is attached below. */
#include <ruby/ruby.h>

static int key_is_in_other(VALUE key, VALUE val, VALUE data) {
  struct st_table *other = ((struct st_table**) data)[0];
  if (st_lookup(other, key, 0)) {
    return ST_CONTINUE;
  } else {
    int *failed = ((int**) data)[1];
    *failed = 1;
    return ST_STOP;
  }
}

static VALUE hash_size(VALUE hash) {
  if (!RHASH(hash)->ntbl)
    return INT2FIX(0);
  return INT2FIX(RHASH(hash)->ntbl->num_entries);
}

static VALUE same_keys(VALUE self, VALUE other) {
  if (CLASS_OF(other) != rb_cHash)
    rb_raise(rb_eArgError, "argument needs to be a hash");
  if (hash_size(self) != hash_size(other))
    return Qfalse;
  if (!RHASH(other)->ntbl && !RHASH(other)->ntbl)
    return Qtrue;
  int failed = 0;
  void *data[2] = { RHASH(other)->ntbl, &failed };
  rb_hash_foreach(self, key_is_in_other, (VALUE) data);
  return failed ? Qfalse : Qtrue;
}

void Init_superhash(void) {
  rb_define_method(rb_cHash, "same_keys?", same_keys, 1);
}

一个人工的、综合的、简单的基准显示了下面的内容

require 'superhash'
require 'benchmark'
n = 100_000
h1 = h2 = {a:5, b:8, c:1, d:9}
Benchmark.bm do |b|
  # freemasonjson's state of the art.
  b.report { n.times { h1.size == h2.size and h1.keys.all? { |key| !!h2[key] }}}
  # This solution
  b.report { n.times { h1.same_keys? h2} }
end
#       user     system      total        real
#   0.310000   0.000000   0.310000 (  0.312249)
#   0.050000   0.000000   0.050000 (  0.051807)

以下是我的解决方案：

class Hash
    # doesn't check recursively
    def same_keys?(compare)
        if compare.class == Hash
            if self.size == compare.size
               self.keys.all? {|s| compare.key?(s)}
            else
                return false
            end
        else
            nil
        end
    end
end

a = c = {  a: nil,    b: "whatever1",  c: 1.14,     d: true   }
b     = {  a: "foo",  b: "whatever2",  c: 2.14,   "d": false  }
d     = {  a: "bar",  b: "whatever3",  c: 3.14,               }

puts a.same_keys?(b)                    # => true
puts a.same_keys?(c)                    # => true
puts a.same_keys?(d)                    # => false   
puts a.same_keys?(false).inspect        # => nil
puts a.same_keys?("jack").inspect       # => nil
puts a.same_keys?({}).inspect           # => false

您是否将其与

h1.keys.sort==h2.keys.sort

进行了比较？我使用了一个有限的示例

h1.keys.sort==h2.keys.sort

稍微慢一点。但我不确定一般情况是否如此，我认为你应该在问题中提到这一点。而且我会把答案作为问题的一部分，而不是答案，我认为这纯粹是为了方便。你会写“这比我的答案容易吗”？现在我必须向下滚动，解析答案并找到你的答案。这对我来说毫无理由是额外的工作。一个离题的问题：这纯粹是为了好玩，还是你有非常大的散列（并且你已经分析了你的代码），改进这部分代码会给你带来巨大的性能提升？甚至更好，

h2.include？（key）

。我做了一些基准测试，到目前为止，这个答案似乎是一个明显的赢家。使用

Hash#include？

不会带来任何性能改进，但在可读性方面肯定是一个很好的进步。

如果a然后b结束

a和b

@Jan请注意基准测试。特别是合成的！如果且仅当键集的差异更频繁时，此解决方案（无论是否使用include）将更快。如果主要情况是equals keys sets，则速度会较慢。@akuhn，这就是我的基准测试所显示的。这是一个惊喜，但当我想它是有道理的。与其他答案不同，此解决方案不会在内存中创建许多其他对象。因此，它是GC友好的，鉴于MRI的GC性能，这是一个巨大的优势。我建议将基准的名称作为参数添加到

报告方法中。这将允许在结果报告中添加名称，使其更易于阅读。哇，太棒了！我必须重新认识C
CFLAGS=-std=c99 -O2 -Wall -fPIC $(shell pkg-config ruby-1.9 --cflags)
LDFLAGS=-Wl,-O1,--as-needed $(shell pkg-config ruby-1.9 --libs)
superhash.so: superhash.o
    $(LINK.c) -shared $^ -o $@

require 'superhash'
require 'benchmark'
n = 100_000
h1 = h2 = {a:5, b:8, c:1, d:9}
Benchmark.bm do |b|
  # freemasonjson's state of the art.
  b.report { n.times { h1.size == h2.size and h1.keys.all? { |key| !!h2[key] }}}
  # This solution
  b.report { n.times { h1.same_keys? h2} }
end
#       user     system      total        real
#   0.310000   0.000000   0.310000 (  0.312249)
#   0.050000   0.000000   0.050000 (  0.051807)

class Hash
    # doesn't check recursively
    def same_keys?(compare)
        if compare.class == Hash
            if self.size == compare.size
               self.keys.all? {|s| compare.key?(s)}
            else
                return false
            end
        else
            nil
        end
    end
end

a = c = {  a: nil,    b: "whatever1",  c: 1.14,     d: true   }
b     = {  a: "foo",  b: "whatever2",  c: 2.14,   "d": false  }
d     = {  a: "bar",  b: "whatever3",  c: 3.14,               }

puts a.same_keys?(b)                    # => true
puts a.same_keys?(c)                    # => true
puts a.same_keys?(d)                    # => false   
puts a.same_keys?(false).inspect        # => nil
puts a.same_keys?("jack").inspect       # => nil
puts a.same_keys?({}).inspect           # => false