Python 如何从草图图像中删除文本_Python_Python 3.x_Opencv

Python 如何从草图图像中删除文本

python python-3.x opencv

Python 如何从草图图像中删除文本,python,python-3.x,opencv,Python,Python 3.x,Opencv,我有一些略图，其中的图像包含文字标题。我正试图删除这些标题我正在使用以下代码： import cv2 import pytesseract pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe" # Load image, grayscale, blur, Otsu's threshold image = cv2.imread('1.png') gra

我有一些略图，其中的图像包含文字标题。我正试图删除这些标题

我正在使用以下代码：

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image, grayscale, blur, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Find contours and filter using contour area
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area > 500:
        cv2.drawContours(thresh, [c], -1, 0, -1)

# Invert image and OCR
invert = 255 - thresh
Output= thresh - invert 
cv2.imshow('thresh', thresh)
cv2.imshow('invert', invert)
cv2.imshow('output', output)
cv2.waitKey()

代码不适用于这些图像。

此处不需要进行cv2预处理，tesseract可以自行查找文本。请参见下面的示例，注释内联：

results=pytesseract.image_to_data（'1.png'，config='--psm 11'，output_type='dict'）
对于范围内的i（len（结果[“文本]））：
#从中提取文本区域的边界框坐标
#当前结果
x=结果[“左”][i]
y=结果[“顶部”][i]
w=结果[“宽度”][i]
h=结果[“高度”][i]
#提取文本的可信度
conf=int（结果[“conf”][i]）
如果conf>60:#根据您的喜好进行调整
#用白色矩形覆盖文本
cv2.矩形（图像，（x，y），（x+w，y+h），（255，255，255），-1）

在左侧检测到文本，在右侧清理图像：

另一个选项，不使用

Tesseract

。只需使用轮廓的

区域

，通过覆盖白色填充矩形来过滤较小的轮廓：

# Imports
import cv2
import numpy as np

# Read image
imagePath = "C://opencvImages//"
inputImage = cv2.imread(imagePath+"0enxN.png")

# Convert BGR to grayscale:
binaryImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Invert image:
binaryImage = 255 - binaryImage

# Find the external contours on the binary image:
contours, hierarchy = cv2.findContours(binaryImage, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Invert image:
binaryImage = 255 - binaryImage

# Look for the bounding boxes:
for _, c in enumerate(contours):

    # Get the contour's bounding rectangle:
    boundRect = cv2.boundingRect(c)

    # Get the dimensions of the bounding rect:
    rectX = boundRect[0]
    rectY = boundRect[1]
    rectWidth = boundRect[2]
    rectHeight = boundRect[3]

    # Get Bounding Rectangle Area:
    rectArea = rectWidth * rectHeight
    # Set minimum area threshold:
    minArea = 1000

    # Check for minimum area:
    if rectArea < minArea:
        # Draw white rectangle to cover small contour:
        cv2.rectangle(binaryImage, (rectX, rectY), (rectX + rectWidth, rectY + rectHeight),
                      (255, 255, 255), -1)
        cv2.imshow("Binary Mask", binaryImage)
        cv2.waitKey(0)

#导入
进口cv2
将numpy作为np导入
#读取图像
imagePath=“C://opencvImages/”
inputImage=cv2.imread（imagePath+“0enxN.png”）
#将BGR转换为灰度：
二进制图像=cv2.CVT颜色（输入图像，cv2.COLOR\u BGR2GRAY）
#反转图像：
binaryImage=255-binaryImage
#在二值图像上查找外部轮廓：
轮廓，层次=cv2.findContours（二进制图像，cv2.RETR\u外部，cv2.CHAIN\u近似值\u简单）
#反转图像：
binaryImage=255-binaryImage
#查找边界框：
对于枚举中的uC（等高线）：
#获取轮廓的边界矩形：
boundRect=cv2.boundingRect（c）
#获取边界矩形的尺寸：
rectX=boundRect[0]
rectY=boundRect[1]
rectWidth=boundRect[2]
rectHeight=boundRect[3]
#获取边界矩形区域：
矩形区域=矩形宽度*矩形高度
#设置最小区域阈值：
米纳雷=1000
#检查最小面积：
如果直肠面积<最小面积：
#绘制白色矩形以覆盖小轮廓：
cv2.矩形（二进制图像，（rectX，rectY），（rectX+rectWidth，rectY+recthheight），
(255, 255, 255), -1)
imshow（“二进制掩码”，二进制图像）
cv2.等待键（0）

这将产生：

您的所有图像中是否都有细线和粗体标题，如上例所示？图形也可以粗体，但标题是粗体的。我只是想删除文本。谢谢