Python 如何使用pytesseract从宣传册图像中提取文本
我已尝试从宣传册图像中提取文本: 代码:Python 如何使用pytesseract从宣传册图像中提取文本,python,python-tesseract,Python,Python Tesseract,我已尝试从宣传册图像中提取文本: 代码: import cv2 import pytesseract from PIL import Image im_folder = 'img_path' im_gray = cv2.imread(im_folder+'/'+'big-bazaar-wed-offer-may-21-2014.png', cv2.IMREAD_GRAYSCALE) #converting image to binary image (thresh, im_bw) = cv
import cv2
import pytesseract
from PIL import Image
im_folder = 'img_path'
im_gray = cv2.imread(im_folder+'/'+'big-bazaar-wed-offer-may-21-2014.png', cv2.IMREAD_GRAYSCALE)
#converting image to binary image
(thresh, im_bw) = cv2.threshold(im_gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
#enhancing the image size
img = cv2.resize(im_bw,None,fx=4,fy=4, interpolation=cv2.INTER_AREA)
cv2.imwrite('im_enhance.png',img)
#Text extraction
text = pytesseract.image_to_string(Image.open('im_enhance.png'))
print(text)
由于这是一张宣传册图像,我将其转换为二进制图像,并对其进行增强,以获得更好的OCR结果
我可以用这段代码提取文本,但有些文本无法提取,尤其是金额/价格
为了提取所有文本,我应该做哪些更改