Ruby 什么'；这是比较图像的最佳技术'；相似性_Ruby_Performance_Image Processing

Ruby 什么'；这是比较图像的最佳技术'；相似性

ruby performance image-processing

Ruby 什么'；这是比较图像的最佳技术'；相似性,ruby,performance,image-processing,Ruby,Performance,Image Processing,我有一张图像master.png和超过10000张其他图像（slave_1.png，slave_2.png，…）。他们都有：相同尺寸（例如100x50像素）相同格式（png）相同的图像背景 98%的从机与主机相同，但2%的从机的内容略有不同：新的颜色出现了图像中出现新的小图形我需要找出那些不同的奴隶。我使用的是Ruby，但我在使用另一种技术方面没有问题我试图File.bin读取两幅图像，然后使用=进行比较。它为80%的奴隶工作。在其他的奴隶身上，它发现了变化，但图像在视觉上

我有一张图像

master.png

和超过10000张其他图像（

slave_1.png

，

slave_2.png

，…）。他们都有：

相同尺寸（例如100x50像素）
相同格式（png）
相同的图像背景

98%的从机与主机相同，但2%的从机的内容略有不同：

新的颜色出现了
图像中出现新的小图形

我需要找出那些不同的奴隶。我使用的是Ruby，但我在使用另一种技术方面没有问题

我试图

File.bin读取两幅图像，然后使用=
进行比较。它为80%的奴隶工作。在其他的奴隶身上，它发现了变化，但图像在视觉上是相同的。所以它不起作用
备选方案包括：
计算每个从属设备中存在的颜色数量，并与主设备进行比较。它将在100%的时间内工作。但我不知道如何在Ruby中以“轻松”的方式完成它
使用一些图像处理器通过直方图进行比较，如RMagick
或ruby-vips8
。这种方法应该也可以，但我需要消耗尽可能少的CPU/内存
编写一个C++/Go/Crystal程序，逐像素读取并返回多种颜色。我认为通过这种方式，我们可以从if中获得性能。但肯定是艰难的道路
有什么启示吗？建议？
在ruby VIP中，您可以这样做：
require 'vips'

# find normalised histogram of reference image
ref = VIPS::Image.new ARGV[0], :sequential => true
ref_hist = ref.hist.histnorm

# trigger a GC every few loops to keep memuse down
loop = 0

ARGV[1..-1].each do |filename|
    # find sample hist
    sample = VIPS::Image.new filename, :sequential => true
    sample_hist = sample.hist.histnorm

    # calculate sum of squares of differences, if it's over a threshold, print
    # the filename
    diff_hist = ref_hist.subtract(sample_hist).pow(2)
    diff = diff_hist.avg * diff_hist.x_size * diff_hist.y_size

    if diff > 100
        puts "#{filename}, #{diff}"
    end

    loop += 1
    if loop % 100 == 0
        GC.start
    end
end

偶尔的GC.start
是使Ruby成为免费的东西和防止内存填充所必需的。遗憾的是，尽管它只是每100张图片中才有一张，但它仍然要花费大量时间进行垃圾收集
$ vips crop ~/pics/k2.jpg ref.png 0 0 100 50
$ for i in {1..10000}; do cp ref.png $i.png; done
$ time ../similarity.rb ref.png *.png
real    2m44.294s
user    7m30.696s
sys 0m20.780s
peak mem 270mb

如果你愿意考虑Python，它会快很多，因为它引用计数，不需要一直扫描。
import sys
from gi.repository import Vips

# find normalised histogram of reference image
ref = Vips.Image.new_from_file(sys.argv[1], access = Vips.Access.SEQUENTIAL)
ref_hist = ref.hist_find().hist_norm()

for filename in sys.argv[2:]:
    # find sample hist
    sample = Vips.Image.new_from_file(filename, access = Vips.Access.SEQUENTIAL)
    sample_hist = sample.hist_find().hist_norm()

    # calculate sum of squares of difference, if it's over a threshold, print
    # the filename
    diff_hist = (ref_hist - sample_hist) ** 2
    diff = diff_hist.avg() * diff_hist.width * diff_hist.height

    if diff > 100:
        print filename, ", ", diff

我明白了：
$ time ../similarity.py ref.png *.png
real    1m4.001s
user    1m3.508s
sys 0m10.060s
peak mem 58mb

调查这里讨论了许多选项。关于与File.binread
比较的另一个注意事项。因为您只是比较文件内容和资源以及重要的性能，所以最好使用bash来进行比较。查看：diff
、cmp
或md5
。如果您需要一个分类器，那么这可能是一项工作。当您说要以轻松的方式进行时，您的意思是您不想使用太多CPU吗？或者你的意思是你想快速得到答案——这可能意味着一段时间内要使用所有的CPU？@MarkSetchell所说的“轻”，我的意思是使用尽可能少的CPU/RAM。