Python 在验证码图像中分割字母_Python_Image Processing_Captcha_Image Segmentation_Scikit Image

Python 在验证码图像中分割字母

python image-processing

Python 在验证码图像中分割字母,python,image-processing,captcha,image-segmentation,scikit-image,Python,Image Processing,Captcha,Image Segmentation,Scikit Image,我用Python编写了这个算法，用于使用scikit图像读取CAPTCHA： from skimage.color import rgb2gray from skimage import io def process(self, image): """ Processes a CAPTCHA by removing noise Args: image (str): The file path of the image to process ""

我用Python编写了这个算法，用于使用scikit图像读取CAPTCHA：

from skimage.color import rgb2gray
from skimage import io

def process(self, image):
    """
    Processes a CAPTCHA by removing noise

    Args:
        image (str): The file path of the image to process
    """

    input = io.imread(image)
    histogram = {}

    for x in range(input.shape[0]):
        for y in range(input.shape[1]):
            pixel = input[x, y]
            hex = '%02x%02x%02x' % (pixel[0], pixel[1], pixel[2])

            if hex in histogram:
                histogram[hex] += 1
            else:
                histogram[hex] = 1

    histogram = sorted(histogram, key = histogram.get, reverse=True)
    threshold = len(histogram) * 0.015

    for x in range(input.shape[0]):
        for y in range(input.shape[1]):
            pixel = input[x, y]
            hex = '%02x%02x%02x' % (pixel[0], pixel[1], pixel[2])
            index = histogram.index(hex)

            if index < 3 or index > threshold:
                input[x, y] = [255, 255, 255, 255]

    input = rgb2gray(~input)
    io.imsave(image, input)

从skimage.color导入rgb2gray
从撇渣进口io
def过程（自身、图像）：
"""
通过去除噪声来处理验证码
Args：
image（str）：要处理的映像的文件路径
"""
输入=io.imread（图像）
直方图={}
对于范围内的x（input.shape[0]）：
对于范围内的y（input.shape[1]）：
像素=输入[x，y]
十六进制=“%02x%02x%02x%”（像素[0]，像素[1]，像素[2]）
如果直方图中有十六进制：
直方图[hex]+=1
其他：
直方图[hex]=1
直方图=已排序（直方图，键=histogram.get，反向=True）
阈值=len（直方图）*0.015
对于范围内的x（input.shape[0]）：
对于范围内的y（input.shape[1]）：
像素=输入[x，y]
十六进制=“%02x%02x%02x%”（像素[0]，像素[1]，像素[2]）
索引=直方图。索引（十六进制）
如果索引<3或索引>阈值：
输入[x，y]=[255，255，255，255]
输入=rgb2gray（~input）
io.imsave（图像，输入）

之前：

之后：

它运行得相当好，在通过谷歌的Tesseract OCR运行后，我得到了不错的结果，但我想让它变得更好。我认为校直字母会产生更好的结果。我的问题是我该怎么做

我知道我需要以某种方式包装这些信件，就像这样：

然后，对于每个角色，基于垂直或水平线将其旋转若干度

我最初的想法是确定一个字符的中心（可能是通过在直方图中找到最常用的颜色簇），然后展开一个框，直到它找到黑色，但我还是不太确定如何进行

在图像分割中有哪些常用的方法可以达到这个效果

编辑：

最后，进一步细化颜色过滤器并将Tesseract限制为仅使用字符，在不使用任何斜丝的情况下产生了接近100%的准确结果。

您要执行的操作在技术上是计算机视觉中称为对象的斜丝，为此，您必须对对象应用几何变换，我有一段代码要在对象（二进制）上应用deskewing。以下是代码（使用opencv库）：

对于这个应用程序，OpenCV似乎比任何其他模块都有用，但它们还不支持Python 3。谢谢你。我仍然需要一种方法来查找要进行桌面查看的区域。什么是图像时刻？OpenCV 3支持python 3，请查看他们的网站以了解更多详细信息，您不需要查找用于桌面查看的特定区域，您只需将每个边界矩形作为图像发送到该方法的输入，如果字母以任何方向对齐，它会自动找到适当的deskew系数，如果字母正确对齐，它将不会更改其几何体。其次，图像矩是图像像素强度或像素指数的特定加权平均值（矩），或这些矩的函数，通常被选择为具有一些吸引人的特性或解释。

skimage.moments.regionprops

将为您提供这些矩。使用Ankit上面提到的相同想法，可以使用

skimage.transform

完成桌面查看。我有

regionprops

功能提供的桌面查看区域。如上所述，我可以使用

draw.line

在它们周围绘制方框。我看到有一个

变换.AffineTransform

。假设这就是我想要的转变，我如何把这两个问题放在一起？我投票把这个问题作为离题题来结束，因为它更适合于或。。注意。我们能把它移到那里吗？

def deskew(image, width):
    (h, w) = image.shape[:2]
    moments = cv2.moments(image)
    skew = moments["mu11"] / moments["mu02"]
    M = np.float32([[1, skew, -0.5 * w * skew],[0, 1, 0]])
    image = cv2.warpAffine(image, M, (w, h), flags = cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR) 
    return image