在ruby中计算汉明距离最有效的方法是什么？_Ruby_Hamming Distance

在ruby中计算汉明距离最有效的方法是什么？

ruby

在ruby中计算汉明距离最有效的方法是什么？,ruby,hamming-distance,Ruby,Hamming Distance,在ruby中，计算两个无符号整数之间的位差（例如，汉明距离）最有效的方法是什么例如，我有整数a=2323409845和b=1782647144 它们的二进制表示形式是： a = 10001010011111000110101110110101 b = 01101010010000010000100101101000 a和b之间的位差为17 我可以对它们进行逻辑异或运算，但这会给我一个不同的整数17，然后我必须遍历结果的二进制表示，并计算1的# 计算位差的最有效方法是什么现在，计算多个整数序

在ruby中，计算两个无符号整数之间的位差（例如，汉明距离）最有效的方法是什么

例如，我有整数a=2323409845和b=1782647144

它们的二进制表示形式是：

a = 10001010011111000110101110110101
b = 01101010010000010000100101101000

a和b之间的位差为17

我可以对它们进行逻辑异或运算，但这会给我一个不同的整数17，然后我必须遍历结果的二进制表示，并计算1的#

计算位差的最有效方法是什么

现在，计算多个整数序列的位差时，答案会改变吗？例如，给定2个无符号整数序列：

x = {2323409845,641760420,509499086....}
y = {uint,uint,uint...}

计算两个序列之间位差的最有效方法是什么

您是要遍历序列，还是有一种更快的方法可以一次计算整个序列的差异？

Wegner的算法：

def hamm_dist(a, b)
  dist = 0
  val = a ^ b

  while not val.zero?
    dist += 1
    val &= val - 1
  end
  dist
end

p hamm_dist(2323409845, 1782647144) # => 17

您可以使用Ruby中优化的字符串函数来进行位计数，而不是纯粹的算术运算。通过一些快速的基准测试，它的速度快了大约6倍

def h2(a, b)
  (a^b).to_s(2).count("1")
end

h1是计算的正常方式，而h2将xor转换为字符串，并计算“1”的数量

基准：

ruby-1.9.2-p180:001:0>> def h1(a, b)
ruby-1.9.2-p180:002:1*> ret = 0
ruby-1.9.2-p180:003:1*> xor = a ^ b
ruby-1.9.2-p180:004:1*> until xor == 0
ruby-1.9.2-p180:005:2*> ret += 1
ruby-1.9.2-p180:006:2*> xor &= xor - 1
ruby-1.9.2-p180:007:2*> end
ruby-1.9.2-p180:008:1*> ret
ruby-1.9.2-p180:009:1*> end
# => nil
ruby-1.9.2-p180:010:0>> def h2(a, b)
ruby-1.9.2-p180:011:1*> (a^b).to_s(2).count("1")
ruby-1.9.2-p180:012:1*> end
# => nil
ruby-1.9.2-p180:013:0>> h1(2323409845, 1782647144)
# => 17
ruby-1.9.2-p180:014:0>> h2(2323409845, 1782647144)
# => 17
ruby-1.9.2-p180:015:0>> quickbench(10**5) { h1(2323409845, 1782647144) }
Rehearsal ------------------------------------
   2.060000   0.000000   2.060000 (  1.944690)
--------------------------- total: 2.060000sec

       user     system      total        real
   1.990000   0.000000   1.990000 (  1.958056)
# => nil
ruby-1.9.2-p180:016:0>> quickbench(10**5) { h2(2323409845, 1782647144) }
Rehearsal ------------------------------------
   0.340000   0.000000   0.340000 (  0.333673)
--------------------------- total: 0.340000sec

       user     system      total        real
   0.320000   0.000000   0.320000 (  0.326854)
# => nil
ruby-1.9.2-p180:017:0>>

根据mu的建议，我编写了一个简单的C扩展来使用uuuBuiltin_popcount，并使用基准测试验证它至少比ruby的优化字符串函数快3倍

我看了以下两个教程：

在我的节目中：

require './FastPopcount/fastpopcount.so'
include FastPopcount

def hamming(a,b)
  popcount(a^b)
end

然后在包含我的程序的目录中，我创建了一个包含以下文件的文件夹“PopCount”

extconf.rb：

# Loads mkmf which is used to make makefiles for Ruby extensions
require 'mkmf'

# Give it a name
extension_name = 'fastpopcount'

# The destination
dir_config(extension_name)

# Do the work
create_makefile(extension_name)

popcount.c：

// Include the Ruby headers and goodies
#include "ruby.h"

// Defining a space for information and references about the module to be stored internally
VALUE FastPopcount = Qnil;

// Prototype for the initialization method - Ruby calls this, not you
void Init_fastpopcount();

// Prototype for our method 'popcount' - methods are prefixed by 'method_' here
VALUE method_popcount(int argc, VALUE *argv, VALUE self);

// The initialization method for this module
void Init_fastpopcount() {
    FastPopcount = rb_define_module("FastPopcount");
    rb_define_method(FastPopcount, "popcount", method_popcount, 1); 
}

// Our 'popcount' method.. it uses the builtin popcount
VALUE method_popcount(int argc, VALUE *argv, VALUE self) {
    return INT2NUM(__builtin_popcount(NUM2UINT(argv)));
}

然后在popcount目录中运行：

ruby extconf.rb 制造

然后运行程序，你就知道了……用ruby实现汉明距离的最快方法。

如果你想遵循基于c的路径，最好在makefile中添加编译器标志

-msse4.2

。这允许编译器生成基于硬件的

popcnt

指令，而不是使用表生成popcount。在我的系统上，这大约快了2.5倍。

非常感谢，我发现这也快了很多。按照您的建议，使用内置字符串函数进行大约21K的比较大约需要3秒钟，而传统方法需要两倍的长！我刚刚这么做了，似乎比下面的方法（使用Ruby的优化字符串函数）快了3倍。我很晚才参加这个聚会，但你可能想试一试

\u内置的popcount

是最慢的方法之一