Python 通过复制TesserCap的斩波滤波器去除captcha图像的背景噪声_Python_Image Processing_Imagemagick_Python Imaging Library

Python 通过复制TesserCap的斩波滤波器去除captcha图像的背景噪声

python image-processing imagemagick

Python 通过复制TesserCap的斩波滤波器去除captcha图像的背景噪声,python,image-processing,imagemagick,python-imaging-library,Python,Image Processing,Imagemagick,Python Imaging Library,我有一个验证码图像，看起来像这样：使用McAfee提供的实用程序，我可以对图像应用“斩波”过滤器。（在运行它之前，我确保图像中只有两种颜色，白色和黑色。）我对在文本框中使用值为2的过滤器的结果印象非常深刻。它准确地消除了大部分噪音，但保留了主要文本，因此：我想在我自己的一个脚本上实现类似的东西，所以我试图找出TesserCap使用的图像处理库。我什么也找不到；它使用自己的代码来处理图像。然后我读了这篇文章，它准确地解释了程序的工作原理。它向我描述了这个斩波滤波器的功能：如果给定灰度值的

我有一个验证码图像，看起来像这样：

使用McAfee提供的实用程序，我可以对图像应用“斩波”过滤器。（在运行它之前，我确保图像中只有两种颜色，白色和黑色。）我对在文本框中使用值为2的过滤器的结果印象非常深刻。它准确地消除了大部分噪音，但保留了主要文本，因此：

我想在我自己的一个脚本上实现类似的东西，所以我试图找出TesserCap使用的图像处理库。我什么也找不到；它使用自己的代码来处理图像。然后我读了这篇文章，它准确地解释了程序的工作原理。它向我描述了这个斩波滤波器的功能：

如果给定灰度值的连续像素数小于大于数字框中提供的数字，则斩波滤波器将这些序列替换为0（黑色）或255（白色）根据用户选择。验证码在水平和水平方向上进行分析垂直方向和相应的变化

我不确定我是否理解它在做什么。我的脚本是用Python编写的，所以我试着用PIL来处理像素，就像上面引用的那样。听起来很简单，但我失败了，可能是因为我不知道过滤器到底在做什么：

（这是由使用圆形图案的稍有不同的验证码制成的。）

我还试着看看是否可以用ImageMagick的convert.exe轻松完成。他们的选择是完全不同的。使用-中值和一些-形态学命令有助于减少一些噪音，但出现了讨厌的点，字母变得非常扭曲。这远不像用TesserCap做斩波滤波器那么简单

因此，我的问题如下：如何在Python中实现TesserCap的斩波过滤器，无论是使用PIL还是ImageMagick？这种斩波过滤器比我尝试过的任何一种替代品都好得多，但我似乎无法复制它。我已经为此工作了几个小时，但还没有找到任何解决办法。

尝试类似的方法（伪代码）：

然后对列重复同样的操作。看起来它至少能起一点作用。像这样水平和垂直移动也将删除水平/垂直线。

该算法基本上检查一行中是否有多个目标像素（在本例中为非白色像素），并在像素数小于或等于切分因子时更改这些像素

例如，在一个像素的样本行像素中，其中

是黑色而<<<<
<

是白色，在

2
例如，在一个像素的样本行中，其中的一个像素行中，例例例例例例，其中例例例例例例例例例例例例例例例例例，其中，其中，例例例例例例例例例例例例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例，例。这是因为存在小于或等于2像素的黑色像素序列，并且这些序列被替换为白色。保留大于2像素的连续序列
这是chop算法的结果，该算法在我的Python代码（如下）中在您文章的原始图像上实现：

为了将此应用于整个图像，只需在每一行和每一列上执行此算法。下面是实现以下功能的Python代码：
import PIL.Image
import sys

# python chop.py [chop-factor] [in-file] [out-file]

chop = int(sys.argv[1])
image = PIL.Image.open(sys.argv[2]).convert('1')
width, height = image.size
data = image.load()

# Iterate through the rows.
for y in range(height):
    for x in range(width):

        # Make sure we're on a dark pixel.
        if data[x, y] > 128:
            continue

        # Keep a total of non-white contiguous pixels.
        total = 0

        # Check a sequence ranging from x to image.width.
        for c in range(x, width):

            # If the pixel is dark, add it to the total.
            if data[c, y] < 128:
                total += 1

            # If the pixel is light, stop the sequence.
            else:
                break

        # If the total is less than the chop, replace everything with white.
        if total <= chop:
            for c in range(total):
                data[x + c, y] = 255

        # Skip this sequence we just altered.
        x += total


# Iterate through the columns.
for x in range(width):
    for y in range(height):

        # Make sure we're on a dark pixel.
        if data[x, y] > 128:
            continue

        # Keep a total of non-white contiguous pixels.
        total = 0

        # Check a sequence ranging from y to image.height.
        for c in range(y, height):

            # If the pixel is dark, add it to the total.
            if data[x, c] < 128:
                total += 1

            # If the pixel is light, stop the sequence.
            else:
                break

        # If the total is less than the chop, replace everything with white.
        if total <= chop:
            for c in range(total):
                data[x, y + c] = 255

        # Skip this sequence we just altered.
        y += total

image.save(sys.argv[3])

导入PIL.Image
导入系统
#python chop.py[chop factor][in file][out file]
chop=int（sys.argv[1]）
image=PIL.image.open（sys.argv[2]）.convert（'1'）
宽度，高度=image.size
data=image.load（）
#遍历行。
对于范围内的y（高度）：
对于范围内的x（宽度）：
#确保我们在一个暗像素上。
如果数据[x，y]>128：
持续
#保持非白色连续像素总数。
总数=0
#检查从x到image.width的序列。
对于范围内的c（x，宽度）：
#如果像素为黑色，则将其添加到总数中。
如果数据[c，y]<128：
总数+=1
#如果像素为亮，则停止序列。
其他：
打破
#如果总数少于印章，则将所有内容替换为白色。
如果将总计作为旁注，您可以尝试使用统计信息。异常值（2 sd）或极端异常值（3.5 sd）允许我剥离验证码图像（出于法律目的），而不会丢失图像质量谢谢！我面临的另一个问题是每个角色都有不同的角度，有什么解决办法吗？@kbhomes有一个小错误。你能看到字母变薄了吗？这是因为“x+=total”在for循环中不起作用，您需要使其成为while循环。@AndreMiras我如何使x+=total
在while循环中起作用？@JohnSmith well而x
，然后在循环开始时x+=1
，显然x需要初始化为某个值（比如-1）。
import PIL.Image
import sys

# python chop.py [chop-factor] [in-file] [out-file]

chop = int(sys.argv[1])
image = PIL.Image.open(sys.argv[2]).convert('1')
width, height = image.size
data = image.load()

# Iterate through the rows.
for y in range(height):
    for x in range(width):

        # Make sure we're on a dark pixel.
        if data[x, y] > 128:
            continue

        # Keep a total of non-white contiguous pixels.
        total = 0

        # Check a sequence ranging from x to image.width.
        for c in range(x, width):

            # If the pixel is dark, add it to the total.
            if data[c, y] < 128:
                total += 1

            # If the pixel is light, stop the sequence.
            else:
                break

        # If the total is less than the chop, replace everything with white.
        if total <= chop:
            for c in range(total):
                data[x + c, y] = 255

        # Skip this sequence we just altered.
        x += total


# Iterate through the columns.
for x in range(width):
    for y in range(height):

        # Make sure we're on a dark pixel.
        if data[x, y] > 128:
            continue

        # Keep a total of non-white contiguous pixels.
        total = 0

        # Check a sequence ranging from y to image.height.
        for c in range(y, height):

            # If the pixel is dark, add it to the total.
            if data[x, c] < 128:
                total += 1

            # If the pixel is light, stop the sequence.
            else:
                break

        # If the total is less than the chop, replace everything with white.
        if total <= chop:
            for c in range(total):
                data[x, y + c] = 255

        # Skip this sequence we just altered.
        y += total

image.save(sys.argv[3])