Python 使用OpenCV对pytesseract OCR图像进行预处理_Python_Opencv_Image Processing_Ocr_Python Tesseract

Python 使用OpenCV对pytesseract OCR图像进行预处理

python opencv image-processing

Python 使用OpenCV对pytesseract OCR图像进行预处理,python,opencv,image-processing,ocr,python-tesseract,Python,Opencv,Image Processing,Ocr,Python Tesseract,我想使用OCR（pytesseract）识别图像中的文本，如下所示：我有成千上万支这样的箭。到目前为止，步骤如下：首先调整图像大小（用于另一个过程）。然后我裁剪图像以去除箭头的大部分。接下来，我画一个白色矩形作为一个框架，以消除进一步的噪声，但文本和图像边界之间仍然有距离，以便更好地识别文本。我再次调整图像的大小，以确保大写字母的高度达到~30 px（）。最后，我用150的阈值对图像进行二值化完整代码： import cv2 image_file = '001.jpg' # load

我想使用OCR（pytesseract）识别图像中的文本，如下所示：

我有成千上万支这样的箭。到目前为止，步骤如下：首先调整图像大小（用于另一个过程）。然后我裁剪图像以去除箭头的大部分。接下来，我画一个白色矩形作为一个框架，以消除进一步的噪声，但文本和图像边界之间仍然有距离，以便更好地识别文本。我再次调整图像的大小，以确保大写字母的高度达到~30 px（）。最后，我用150的阈值对图像进行二值化

完整代码：

import cv2

image_file = '001.jpg'

# load the input image and grab the image dimensions
image = cv2.imread(image_file, cv2.IMREAD_GRAYSCALE)
(h_1, w_1) = image.shape[:2]

# resize the image and grab the new image dimensions
image = cv2.resize(image, (int(w_1*320/h_1), 320))
(h_1, w_1) = image.shape

# crop image
image_2 = image[70:h_1-70, 20:w_1-20]

# get image_2 height, width
(h_2, w_2) = image_2.shape

# draw white rectangle as a frame around the number -> remove noise
cv2.rectangle(image_2, (0, 0), (w_2, h_2), (255, 255, 255), 40)

# resize image, that capital letters are ~ 30 px in height
image_2 = cv2.resize(image_2, (int(w_2*50/h_2), 50))

# image binarization
ret, image_2 = cv2.threshold(image_2, 150, 255, cv2.THRESH_BINARY)

# save image to file
cv2.imwrite('processed_' + image_file, image_2)

# tesseract part can be commented out
import pytesseract
config_7 = ("-c tessedit_char_whitelist=0123456789AB --oem 1 --psm 7")
text = pytesseract.image_to_string(image_2, config=config_7)
print("OCR TEXT: " + "{}\n".format(text))

问题是箭头中的文本从未居中。有时，我使用上述方法删除部分文本（例如，在图像50A中）

在图像处理中有没有一种方法可以更优雅地去除箭头？例如，使用轮廓检测和删除？我更感兴趣的是OpenCV部分，而不是tesseract部分来识别文本

非常感谢您的帮助。

如果您查看图片，您将看到图像中有一个白色箭头，也是最大的轮廓（特别是如果您在图像上绘制黑色边框）。如果制作一个空白遮罩并绘制箭头（图像上最大的轮廓），然后稍微腐蚀它，则可以执行实际图像和腐蚀遮罩的逐元素逐位合并。如果不清楚，请查看下面的代码和注释，您将看到它实际上非常简单

# imports
import cv2
import numpy as np

img = cv2.imread("number.png")  # read image
# you can resize the image here if you like - it should still work for both sizes
h, w = img.shape[:2]  # get the actual images height and width
img = cv2.resize(img, (int(w*320/h), 320))
h, w = img.shape[:2]

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  # transform to grayscale
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]  # perform OTSU threhold
cv2.rectangle(thresh, (0, 0), (w, h), (0, 0, 0), 2)
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[0]  # search for contours
max_cnt = max(contours, key=cv2.contourArea)  # select biggest one
mask = np.zeros((h, w), dtype=np.uint8)  # create a black mask
cv2.drawContours(mask, [max_cnt], -1, (255, 255, 255), -1)  # draw biggest contour on the mask
kernel = np.ones((15, 15), dtype=np.uint8)  # make a kernel with appropriate values - in both cases (resized and original) 15 is ok
erosion = cv2.erode(mask, kernel, iterations=1)  # erode the mask with given kernel

reverse = cv2.bitwise_not(img.copy())  # reversed image of the actual image 0 becomes 255 and 255 becomes 0
img = cv2.bitwise_and(reverse, reverse, mask=erosion)  # per-element bit-wise conjunction of the actual image and eroded mask (erosion)
img = cv2.bitwise_not(img)  # revers the image again

# save image to file and display
cv2.imwrite("res.png", img)
cv2.imshow("img", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

结果:

您可以尝试使用简单的Python脚本：

import cv2
import numpy as np
img = cv2.imread('mmubS.png', cv2.IMREAD_GRAYSCALE)
thresh = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY_INV )[1]
im_flood_fill = thresh.copy()
h, w = thresh.shape[:2]
im_flood_fill=cv2.rectangle(im_flood_fill, (0,0), (w-1,h-1), 255, 2)
mask = np.zeros((h + 2, w + 2), np.uint8)
cv2.floodFill(im_flood_fill, mask, (0, 0), 0)
im_flood_fill = cv2.bitwise_not(im_flood_fill)
cv2.imshow('clear text', im_flood_fill)
cv2.imwrite('text.png', im_flood_fill)

结果: