Python 无法从图像中提取单词_Python_Python 3.x_Web Scraping_Python Imaging Library_Python Tesseract

Python 无法从图像中提取单词

python python-3.x web-scraping

Python 无法从图像中提取单词,python,python-3.x,web-scraping,python-imaging-library,python-tesseract,Python,Python 3.x,Web Scraping,Python Imaging Library,Python Tesseract,我用python和pytesseract编写了一个脚本，从图像中提取一个单词。在那张图片中只有一个单词TOOLS，这就是我想要的。目前，我下面的脚本给了我错误的输出，即WIS。我能做些什么来获取文本这是我的剧本： import requests, io, pytesseract from PIL import Image response = requests.get('http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg') i

我用

python

和

pytesseract

编写了一个脚本，从图像中提取一个单词。在那张图片中只有一个单词

TOOLS

，这就是我想要的。目前，我下面的脚本给了我错误的输出，即

WIS

。我能做些什么来获取文本

这是我的剧本：

import requests, io, pytesseract
from PIL import Image

response = requests.get('http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg')
img = Image.open(io.BytesIO(response.content))
img = img.resize([100,100], Image.ANTIALIAS)
img = img.convert('L')
img = img.point(lambda x: 0 if x < 170 else 255)
imagetext = pytesseract.image_to_string(img)
print(imagetext)
# img.show()

预期产出：

TOOLS

实施的关键问题在于：

img = img.resize([100,100], Image.ANTIALIAS)
img = img.point(lambda x: 0 if x < 170 else 255)

img=img.resize（[100100]，Image.ANTIALIAS）
img=img.点（λx:0，如果x<170，则为255）

您可以尝试不同的大小和不同的阈值：

import requests, io, pytesseract
from PIL import Image
from PIL import ImageFilter

response = requests.get('http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg')
img = Image.open(io.BytesIO(response.content))
filters = [
    # ('nearest', Image.NEAREST),
    ('box', Image.BOX),
    # ('bilinear', Image.BILINEAR),
    # ('hamming', Image.HAMMING),
    # ('bicubic', Image.BICUBIC),
    ('lanczos', Image.LANCZOS),
]

subtle_filters = [
    # 'BLUR',
    # 'CONTOUR',
    'DETAIL',
    'EDGE_ENHANCE',
    'EDGE_ENHANCE_MORE',
    # 'EMBOSS',
    'FIND_EDGES',
    'SHARPEN',
    'SMOOTH',
    'SMOOTH_MORE',
]

for name, filt in filters:
    for subtle_filter_name in subtle_filters:
        for s in range(220, 250, 10):
            for threshold in range(250, 253, 1):
                img_temp = img.copy()
                img_temp.thumbnail([s,s], filt)
                img_temp = img_temp.convert('L')
                img_temp = img_temp.point(lambda x: 0 if x < threshold else 255)
                img_temp = img_temp.filter(getattr(ImageFilter, subtle_filter_name))
                imagetext = pytesseract.image_to_string(img_temp)
                print(s, threshold, name, subtle_filter_name, imagetext)
                with open('thumb%s_%s_%s_%s.jpg' % (s, threshold, name, subtle_filter_name), 'wb') as g:
                    img_temp.save(g)

导入请求、io、PyteSeract 从PIL导入图像从PIL导入图像过滤器 response=requests.get（'http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg') img=Image.open（io.BytesIO（response.content））过滤器=[ #（“最近的”，图像。最近的），（'box'，Image.box）， #（“双线性”，Image.bilinear）， #（'hamming'，Image.hamming）， #（'bicubic'，Image.bicubic），（'lanczos'，Image.lanczos）， ] 精细过滤器=[ #“模糊”， #“等高线”， "细节",， “边缘增强”， “边缘增强更多”， #“浮雕”， “查找边”， “削尖”， "顺利",， “更平滑”， ] 对于名称，过滤器中的过滤器：对于精细过滤器中的精细过滤器名称：对于范围（220、250、10）内的s：对于范围内的阈值（250253,1）： img_temp=img.copy（）图像温度缩略图（[s，s]，过滤器）进气温度=进气温度转换（'L'） img_temp=img_temp.点（λx:0，如果x<阈值255） img_temp=img_temp.filter（getattr（图像过滤器，过滤器名称）） imagetext=pytesseract.image\u to\u字符串（img\u temp）打印（s、阈值、名称、精细过滤器名称、图像文本）将open（'thumb%s_%s_%s_%s.jpg'（s，阈值，名称，细微过滤器名称），'wb'）作为g: 进气温度保存（g）看看什么对你有用

我建议您调整图像大小，同时保持原始比例。您也可以尝试一些替代

img\u temp.convert（'L'）

迄今为止最好的：

TWls

和

T0018

您可以尝试手动操作图像，看看是否可以找到一些可以提供更好输出的编辑（例如）

通过预先了解字体，您可能也会获得更好的效果。

关键是将图像转换与

tesseract

功能相匹配。你的主要问题是字体不是普通的。你所需要的只是

from PIL import Image, ImageEnhance, ImageFilter

response = requests.get('http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg')
img = Image.open(io.BytesIO(response.content))

# remove texture
enhancer = ImageEnhance.Color(img)
img = enhancer.enhance(0)   # decolorize
img = img.point(lambda x: 0 if x < 250 else 255) # set threshold
img = img.resize([300, 100], Image.LANCZOS) # resize to remove noise
img = img.point(lambda x: 0 if x < 250 else 255) # get rid of remains of noise
# adjust font weight
img = img.filter(ImageFilter.MaxFilter(11)) # lighten the font ;)
imagetext = pytesseract.image_to_string(img)
print(imagetext)

如果OCR是那么简单…看起来像

img=img.filter（ImageFilter.MaxFilter（11））

是关键：）你能详细说明

img.convert（'L'）

和

imageenhanced.Color（img.enhanced（0）

之间的区别吗？在指令排序方面是否有任何最佳实践？1）

MaxFilter

所做的基本上是形态学腐蚀。2）区别主要是概念上的

.convert（'L'）

将颜色转换为灰度，

颜色（img）。增强（0）

删除色调。3）指令顺序遵循处理逻辑，即从字母中删除图案，转换为黑白图像，调整字体重量并将其发送到

teseract

。如果背景不是白色，我会使用彩色通道，并尝试其他方法，检测长边。因为它是一个单一的图像，我只是加入了一些能起作用的东西，不知怎么的，它很强大。

from PIL import Image, ImageEnhance, ImageFilter

response = requests.get('http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg')
img = Image.open(io.BytesIO(response.content))

# remove texture
enhancer = ImageEnhance.Color(img)
img = enhancer.enhance(0)   # decolorize
img = img.point(lambda x: 0 if x < 250 else 255) # set threshold
img = img.resize([300, 100], Image.LANCZOS) # resize to remove noise
img = img.point(lambda x: 0 if x < 250 else 255) # get rid of remains of noise
# adjust font weight
img = img.filter(ImageFilter.MaxFilter(11)) # lighten the font ;)
imagetext = pytesseract.image_to_string(img)
print(imagetext)

TOOLS