Java 在较大的图像中查找已知的子图像
有人知道在较大图像中定位已知图像的算法(或搜索词/描述)吗? e、 g 我有一个单一桌面窗口的图像,其中包含各种按钮和区域(目标)。我也有代码来捕获当前桌面的屏幕截图。我想一个算法,将帮助我找到在更大的桌面图像目标图像(什么确切的x和y坐标窗口位于)。目标图像可能位于较大图像中的任何位置,并且可能不是100%完全相同(非常相似但不完全相同,可能是操作系统显示差异的b/c) 有人知道这样的算法或算法类吗 我发现了各种图像分割和计算机视觉算法,但它们似乎适合于区域的“模糊”分类,而不是在另一个区域中定位特定的图像Java 在较大的图像中查找已知的子图像,java,algorithm,image-processing,image-manipulation,Java,Algorithm,Image Processing,Image Manipulation,有人知道在较大图像中定位已知图像的算法(或搜索词/描述)吗? e、 g 我有一个单一桌面窗口的图像,其中包含各种按钮和区域(目标)。我也有代码来捕获当前桌面的屏幕截图。我想一个算法,将帮助我找到在更大的桌面图像目标图像(什么确切的x和y坐标窗口位于)。目标图像可能位于较大图像中的任何位置,并且可能不是100%完全相同(非常相似但不完全相同,可能是操作系统显示差异的b/c) 有人知道这样的算法或算法类吗 我发现了各种图像分割和计算机视觉算法,但它们似乎适合于区域的“模糊”分类,而不是在另一个区域中
**我的目标是创建一个框架,在给定一些种子目标图像的情况下,可以在桌面上找到“查看”,找到目标区域并“观察”其变化** 你说你的图像可能不完全相同,但接着说你不想要“模糊”算法。我不确定它们是否兼容。但总的来说,我想你应该看看算法。有一个开源C++包,它可能提供一些提示。也是一个流行的开源Java包。如果您仔细研究一下,这两种方法都至少有一些注册功能可用。以下是您希望使用的代码框架:
// look for all (x,y) positions where target appears in desktop
List<Loc> findMatches(Image desktop, Image target, float threshold) {
List<Loc> locs;
for (int y=0; y<desktop.height()-target.height(); y++) {
for (int x=0; x<desktop.width()-target.width(); x++) {
if (imageDistance(desktop, x, y, target) < threshold) {
locs.append(Loc(x,y));
}
}
}
return locs;
}
// computes the root mean squared error between a rectangular window in
// bigImg and target.
float imageDistance(Image bigImg, int bx, int by, Image target) {
float dist = 0.0;
for (int y=0; y<target.height(); y++) {
for (int x=0; x<target.width(); x++) {
// assume RGB images...
for (int colorChannel=0; colorChannel<3; colorChannel++) {
dist += Math.pow(target.getPixel(x,y) - bigImg.getPixel(bx+x,by+y), 2);
}
}
}
return Math.sqrt(dist) / target.width() / target.height();
}
//查找目标出现在桌面上的所有(x,y)位置
列出findMatches(图像桌面、图像目标、浮动阈值){
列出LOC;
对于(int y=0;y请看我写的论文:。这篇文章非常详细,似乎是唯一一篇讨论如何将傅里叶变换应用于子图像查找问题的文章
简而言之,如果要使用傅里叶变换,可以应用以下公式:当图像A在dx上移动时,图像A和图像B之间的相关性,dy在以下矩阵中给出:C=ifft(fft(A)x共轭(fft(B)).因此,图像C中具有最高值的位置具有最高相关性,并且该位置反映了dx,dy
这个结果适用于相对较大的子图像。对于较小的图像,如本文所述,需要进行更多的工作。然而,这样的傅里叶变换速度非常快。它会导致大约3*sxsylog_2(sx*sy)+3*sx*sy运算。您不需要像“神经网络”中那样的模糊操作,因为(据我所知)您没有旋转、倾斜或类似功能。如果操作系统显示差异是唯一的修改,那么差异应该最小。
因此,WernerVanBelle的论文很好,但并不是真正必要的,MrFooz的代码可以工作,但效率非常低(O(宽度*高度*图案*宽度*图案\高度)
!)
我能想到的最好的算法是用于字符串搜索的Boyer-Moore算法,该算法经过修改,允许进行二维搜索。
代替一个位移,您必须为每种颜色存储一对位移dx
和dy
。检查像素时,您只在x方向移动x=x+dx
,并且只存储dy的最小值dy=min(dy,dy)
,以便在测试整行后设置新的y值(即x>宽度
)
由于可能颜色的imense数量,可能禁止为所有可能颜色创建表格,因此可以使用地图存储规则(如果颜色不在地图内,则默认为图案尺寸),或者分别为每种颜色创建表格,并设置dx=max(dx(红色)、dx(绿色)、dx(蓝色))
-这只是一个近似值,但消除了映射的开销
在坏字符规则的预处理过程中,您可以通过将规则从所有颜色扩展到其“相邻”颜色(无论您如何定义相邻颜色)来解释颜色的微小偏差。我考虑了Werner Van Belle的解决方案(因为所有其他方法都非常缓慢-根本不可行):
用于子图像正确定位的自适应滤波器:FFT
基于图像的子图像定位需要图像归一化才能工作
恰当地
我用C#编写了代码,我需要我的解决方案,但是我得到了非常不准确的结果。它是否真的工作得不好,与Van Belle的说法相反,或者我做错了什么?我用它作为FFTW的C#包装器
这是我的翻译代码:(C++中的原文)
使用系统诊断;
使用制度;
使用System.Runtime.InteropServices;
使用系统图;
使用系统、绘图、成像;
使用System.IO;
使用FFTWSharp;
使用unsigned1=System.Byte;
使用signed2=System.Int16;
使用signed8=System.Int64;
公共类子映像
{
/**
*此程序在较大的图像中查找子图像。它需要两个参数。
*第一个是它必须在其中显示的图像。第二个dimage是
*该程序依赖于许多不同的
*执行计算的步骤。
*
*它将首先对输入图像进行规范化,以改善图像质量
*互相关匹配。一旦找到最佳互相关
*sad匹配器应用于较大图像上的网格中。
*
*以下两篇文章解释了细节:
*
*Werner Van Belle;用于正确定位的自适应滤波器
*子图像:基于FFT的子图像定位需要图像
*正常工作的规范化;11页;2007年10月。
* http://werner.yellowcouch.org/Papers/subimg/
*
*沃纳·范贝尔;投入产出与总和之间的相关性
*对于上的均匀采样信号,绝对差值为-0.8485
*[-1:1];2006年11月
*/
不安全的公共静态点FindSubimage_fftw(字符串[]args)
{
我
using System.Diagnostics;
using System;
using System.Runtime.InteropServices;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using FFTWSharp;
using unsigned1 = System.Byte;
using signed2 = System.Int16;
using signed8 = System.Int64;
public class Subimage
{
/**
* This program finds a subimage in a larger image. It expects two arguments.
* The first is the image in which it must look. The secon dimage is the
* image that is to be found. The program relies on a number of different
* steps to perform the calculation.
*
* It will first normalize the input images in order to improve the
* crosscorrelation matching. Once the best crosscorrelation is found
* a sad-matchers is applied in a grid over the larger image.
*
* The following two article explains the details:
*
* Werner Van Belle; An Adaptive Filter for the Correct Localization
* of Subimages: FFT based Subimage Localization Requires Image
* Normalization to work properly; 11 pages; October 2007.
* http://werner.yellowcouch.org/Papers/subimg/
*
* Werner Van Belle; Correlation between the inproduct and the sum
* of absolute differences is -0.8485 for uniform sampled signals on
* [-1:1]; November 2006
*/
unsafe public static Point FindSubimage_fftw(string[] args)
{
if (args == null || args.Length != 2)
{
Console.Write("Usage: subimg\n" + "\n" + " subimg is an image matcher. It requires two arguments. The first\n" + " image should be the larger of the two. The program will search\n" + " for the best position to superimpose the needle image over the\n" + " haystack image. The output of the the program are the X and Y\n" + " coordinates.\n\n" + " http://werner.yellowouch.org/Papers/subimg/\n");
return new Point();
}
/**
* The larger image will be called A. The smaller image will be called B.
*
* The code below relies heavily upon fftw. The indices necessary for the
* fast r2c and c2r transforms are at best confusing. Especially regarding
* the number of rows and colums (watch our for Asx vs Asx2 !).
*
* After obtaining all the crosscorrelations we will scan through the image
* to find the best sad match. As such we make a backup of the original data
* in advance
*
*/
int Asx = 0, Asy = 0;
signed2[] A = read_image(args[0], ref Asx, ref Asy);
int Asx2 = Asx / 2 + 1;
int Bsx = 0, Bsy = 0;
signed2[] B = read_image(args[1], ref Bsx, ref Bsy);
unsigned1[] Asad = new unsigned1[Asx * Asy];
unsigned1[] Bsad = new unsigned1[Bsx * Bsy];
for (int i = 0; i < Bsx * Bsy; i++)
{
Bsad[i] = (unsigned1)B[i];
Asad[i] = (unsigned1)A[i];
}
for (int i = Bsx * Bsy; i < Asx * Asy; i++)
Asad[i] = (unsigned1)A[i];
/**
* Normalization and windowing of the images.
*
* The window size (wx,wy) is half the size of the smaller subimage. This
* is useful to have as much good information from the subimg.
*/
int wx = Bsx / 2;
int wy = Bsy / 2;
normalize(ref B, Bsx, Bsy, wx, wy);
normalize(ref A, Asx, Asy, wx, wy);
/**
* Preparation of the fourier transforms.
* Aa is the amplitude of image A. Af is the frequence of image A
* Similar for B. crosscors will hold the crosscorrelations.
*/
IntPtr Aa = fftw.malloc(sizeof(double) * Asx * Asy);
IntPtr Af = fftw.malloc(sizeof(double) * 2 * Asx2 * Asy);
IntPtr Ba = fftw.malloc(sizeof(double) * Asx * Asy);
IntPtr Bf = fftw.malloc(sizeof(double) * 2 * Asx2 * Asy);
/**
* The forward transform of A goes from Aa to Af
* The forward tansform of B goes from Ba to Bf
* In Bf we will also calculate the inproduct of Af and Bf
* The backward transform then goes from Bf to Aa again. That
* variable is aliased as crosscors;
*/
//#original: fftw_plan_dft_r2c_2d
//IntPtr forwardA = fftwf.dft(2, new int[] { Asy, Asx }, Aa, Af, fftw_direction.Forward, fftw_flags.Estimate);//equal results
IntPtr forwardA = fftwf.dft_r2c_2d(Asy, Asx, Aa, Af, fftw_flags.Estimate);
//#original: fftw_plan_dft_r2c_2d
//IntPtr forwardB = fftwf.dft(2, new int[] { Asy, Asx }, Ba, Bf, fftw_direction.Forward, fftw_flags.Estimate);//equal results
IntPtr forwardB = fftwf.dft_r2c_2d(Asy, Asx, Ba, Bf, fftw_flags.Estimate);
double* crosscorrs = (double*)Aa;
//#original: fftw_plan_dft_c2r_2d
//IntPtr backward = fftwf.dft(2, new int[] { Asy, Asx }, Bf, Aa, fftw_direction.Backward, fftw_flags.Estimate);//equal results
IntPtr backward = fftwf.dft_c2r_2d(Asy, Asx, Bf, Aa, fftw_flags.Estimate);
/**
* The two forward transforms of A and B. Before we do so we copy the normalized
* data into the double array. For B we also pad the data with 0
*/
for (int row = 0; row < Asy; row++)
for (int col = 0; col < Asx; col++)
((double*)Aa)[col + Asx * row] = A[col + Asx * row];
fftw.execute(forwardA);
for (int j = 0; j < Asx * Asy; j++)
((double*)Ba)[j] = 0;
for (int row = 0; row < Bsy; row++)
for (int col = 0; col < Bsx; col++)
((double*)Ba)[col + Asx * row] = B[col + Bsx * row];
fftw.execute(forwardB);
/**
* The inproduct of the two frequency domains and calculation
* of the crosscorrelations
*/
double norm = Asx * Asy;
for (int j = 0; j < Asx2 * Asy; j++)
{
double a = ((double*)Af)[j * 2];//#Af[j][0];
double b = ((double*)Af)[j * 2 + 1];//#Af[j][1];
double c = ((double*)Bf)[j * 2];//#Bf[j][0];
double d = ((double*)Bf)[j * 2 + 1];//#-Bf[j][1];
double e = a * c - b * d;
double f = a * d + b * c;
((double*)Bf)[j * 2] = (double)(e / norm);//#Bf[j][0] = (fftw_real)(e / norm);
((double*)Bf)[j * 2 + 1] = (double)(f / norm);//Bf[j][1] = (fftw_real)(f / norm);
}
fftw.execute(backward);
/**
* We now have a correlation map. We can spent one more pass
* over the entire image to actually find the best matching images
* as defined by the SAD.
* We calculate this by gridding the entire image according to the
* size of the subimage. In each cel we want to know what the best
* match is.
*/
int sa = 1 + Asx / Bsx;
int sb = 1 + Asy / Bsy;
int sadx = 0;
int sady = 0;
signed8 minsad = Bsx * Bsy * 256L;
for (int a = 0; a < sa; a++)
{
int xl = a * Bsx;
int xr = xl + Bsx;
if (xr > Asx) continue;
for (int b = 0; b < sb; b++)
{
int yl = b * Bsy;
int yr = yl + Bsy;
if (yr > Asy) continue;
// find the maximum correlation in this cell
int cormxat = xl + yl * Asx;
double cormx = crosscorrs[cormxat];
for (int x = xl; x < xr; x++)
for (int y = yl; y < yr; y++)
{
int j = x + y * Asx;
if (crosscorrs[j] > cormx)
cormx = crosscorrs[cormxat = j];
}
int corx = cormxat % Asx;
int cory = cormxat / Asx;
// We dont want subimages that fall of the larger image
if (corx + Bsx > Asx) continue;
if (cory + Bsy > Asy) continue;
signed8 sad = 0;
for (int x = 0; sad < minsad && x < Bsx; x++)
for (int y = 0; y < Bsy; y++)
{
int j = (x + corx) + (y + cory) * Asx;
int i = x + y * Bsx;
sad += Math.Abs((int)Bsad[i] - (int)Asad[j]);
}
if (sad < minsad)
{
minsad = sad;
sadx = corx;
sady = cory;
// printf("* ");
}
// printf("Grid (%d,%d) (%d,%d) Sip=%g Sad=%lld\n",a,b,corx,cory,cormx,sad);
}
}
//Console.Write("{0:D}\t{1:D}\n", sadx, sady);
/**
* Aa, Ba, Af and Bf were allocated in this function
* crosscorrs was an alias for Aa and does not require deletion.
*/
fftw.free(Aa);
fftw.free(Ba);
fftw.free(Af);
fftw.free(Bf);
return new Point(sadx, sady);
}
private static void normalize(ref signed2[] img, int sx, int sy, int wx, int wy)
{
/**
* Calculate the mean background. We will subtract this
* from img to make sure that it has a mean of zero
* over a window size of wx x wy. Afterwards we calculate
* the square of the difference, which will then be used
* to normalize the local variance of img.
*/
signed2[] mean = boxaverage(img, sx, sy, wx, wy);
signed2[] sqr = new signed2[sx * sy];
for (int j = 0; j < sx * sy; j++)
{
img[j] -= mean[j];
signed2 v = img[j];
sqr[j] = (signed2)(v * v);
}
signed2[] var = boxaverage(sqr, sx, sy, wx, wy);
/**
* The normalization process. Currenlty still
* calculated as doubles. Could probably be fixed
* to integers too. The only problem is the sqrt
*/
for (int j = 0; j < sx * sy; j++)
{
double v = Math.Sqrt(Math.Abs((double)var[j]));//#double v = sqrt(fabs(var[j])); <- ambigous
Debug.Assert(!double.IsInfinity(v) && v >= 0);
if (v < 0.0001) v = 0.0001;
img[j] = (signed2)(img[j] * (32 / v));
if (img[j] > 127) img[j] = 127;
if (img[j] < -127) img[j] = -127;
}
/**
* As a last step in the normalization we
* window the sub image around the borders
* to become 0
*/
window(ref img, sx, sy, wx, wy);
}
private static signed2[] boxaverage(signed2[] input, int sx, int sy, int wx, int wy)
{
signed2[] horizontalmean = new signed2[sx * sy];
Debug.Assert(horizontalmean != null);
int wx2 = wx / 2;
int wy2 = wy / 2;
int from = (sy - 1) * sx;
int to = (sy - 1) * sx;
int initcount = wx - wx2;
if (sx < initcount) initcount = sx;
int xli = -wx2;
int xri = wx - wx2;
for (; from >= 0; from -= sx, to -= sx)
{
signed8 sum = 0;
int count = initcount;
for (int c = 0; c < count; c++)
sum += (signed8)input[c + from];
horizontalmean[to] = (signed2)(sum / count);
int xl = xli, x = 1, xr = xri;
/**
* The area where the window is slightly outside the
* left boundary. Beware: the right bnoundary could be
* outside on the other side already
*/
for (; x < sx; x++, xl++, xr++)
{
if (xl >= 0) break;
if (xr < sx)
{
sum += (signed8)input[xr + from];
count++;
}
horizontalmean[x + to] = (signed2)(sum / count);
}
/**
* both bounds of the sliding window
* are fully inside the images
*/
for (; xr < sx; x++, xl++, xr++)
{
sum -= (signed8)input[xl + from];
sum += (signed8)input[xr + from];
horizontalmean[x + to] = (signed2)(sum / count);
}
/**
* the right bound is falling of the page
*/
for (; x < sx; x++, xl++)
{
sum -= (signed8)input[xl + from];
count--;
horizontalmean[x + to] = (signed2)(sum / count);
}
}
/**
* The same process as aboe but for the vertical dimension now
*/
int ssy = (sy - 1) * sx + 1;
from = sx - 1;
signed2[] verticalmean = new signed2[sx * sy];
Debug.Assert(verticalmean != null);
to = sx - 1;
initcount = wy - wy2;
if (sy < initcount) initcount = sy;
int initstopat = initcount * sx;
int yli = -wy2 * sx;
int yri = (wy - wy2) * sx;
for (; from >= 0; from--, to--)
{
signed8 sum = 0;
int count = initcount;
for (int d = 0; d < initstopat; d += sx)
sum += (signed8)horizontalmean[d + from];
verticalmean[to] = (signed2)(sum / count);
int yl = yli, y = 1, yr = yri;
for (; y < ssy; y += sx, yl += sx, yr += sx)
{
if (yl >= 0) break;
if (yr < ssy)
{
sum += (signed8)horizontalmean[yr + from];
count++;
}
verticalmean[y + to] = (signed2)(sum / count);
}
for (; yr < ssy; y += sx, yl += sx, yr += sx)
{
sum -= (signed8)horizontalmean[yl + from];
sum += (signed8)horizontalmean[yr + from];
verticalmean[y + to] = (signed2)(sum / count);
}
for (; y < ssy; y += sx, yl += sx)
{
sum -= (signed8)horizontalmean[yl + from];
count--;
verticalmean[y + to] = (signed2)(sum / count);
}
}
return verticalmean;
}
private static void window(ref signed2[] img, int sx, int sy, int wx, int wy)
{
int wx2 = wx / 2;
int sxsy = sx * sy;
int sx1 = sx - 1;
for (int x = 0; x < wx2; x++)
for (int y = 0; y < sxsy; y += sx)
{
img[x + y] = (signed2)(img[x + y] * x / wx2);
img[sx1 - x + y] = (signed2)(img[sx1 - x + y] * x / wx2);
}
int wy2 = wy / 2;
int syb = (sy - 1) * sx;
int syt = 0;
for (int y = 0; y < wy2; y++)
{
for (int x = 0; x < sx; x++)
{
/**
* here we need to recalculate the stuff (*y/wy2)
* to preserve the discrete nature of integers.
*/
img[x + syt] = (signed2)(img[x + syt] * y / wy2);
img[x + syb] = (signed2)(img[x + syb] * y / wy2);
}
/**
* The next row for the top rows
* The previous row for the bottom rows
*/
syt += sx;
syb -= sx;
}
}
private static signed2[] read_image(string filename, ref int sx, ref int sy)
{
Bitmap image = new Bitmap(filename);
sx = image.Width;
sy = image.Height;
signed2[] GreyImage = new signed2[sx * sy];
BitmapData bitmapData1 = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
unsafe
{
byte* imagePointer = (byte*)bitmapData1.Scan0;
for (int y = 0; y < bitmapData1.Height; y++)
{
for (int x = 0; x < bitmapData1.Width; x++)
{
GreyImage[x + y * sx] = (signed2)((imagePointer[0] + imagePointer[1] + imagePointer[2]) / 3.0);
//4 bytes per pixel
imagePointer += 4;
}//end for x
//4 bytes per pixel
imagePointer += bitmapData1.Stride - (bitmapData1.Width * 4);
}//end for y
}//end unsafe
image.UnlockBits(bitmapData1);
return GreyImage;
}
}
import static marvin.MarvinPluginCollection.*;
public class FindSubimageWindow {
public FindSubimageWindow(){
MarvinImage window = MarvinImageIO.loadImage("./res/window.png");
MarvinImage eclipse = MarvinImageIO.loadImage("./res/eclipse_icon.png");
MarvinImage progress = MarvinImageIO.loadImage("./res/progress_icon.png");
MarvinSegment seg1, seg2;
seg1 = findSubimage(eclipse, window, 0, 0);
drawRect(window, seg1.x1, seg1.y1, seg1.x2-seg1.x1, seg1.y2-seg1.y1);
seg2 = findSubimage(progress, window, 0, 0);
drawRect(window, seg2.x1, seg2.y1, seg2.x2-seg2.x1, seg2.y2-seg2.y1);
drawRect(window, seg1.x1-10, seg1.y1-10, (seg2.x2-seg1.x1)+25, (seg2.y2-seg1.y1)+20);
MarvinImageIO.saveImage(window, "./res/window_out.png");
}
private void drawRect(MarvinImage image, int x, int y, int width, int height){
x-=4; y-=4; width+=8; height+=8;
image.drawRect(x, y, width, height, Color.red);
image.drawRect(x+1, y+1, width-2, height-2, Color.red);
image.drawRect(x+2, y+2, width-4, height-4, Color.red);
}
public static void main(String[] args) {
new FindSubimageWindow();
}
}