Python 为什么我绘制的边界框是倒置的？_Python_Python Imaging Library_Ocr_Python Tesseract

Python 为什么我绘制的边界框是倒置的？

python

Python 为什么我绘制的边界框是倒置的？,python,python-imaging-library,ocr,python-tesseract,Python,Python Imaging Library,Ocr,Python Tesseract,我想我遗漏了一些非常简单的概念，或者可能不理解PIL.ImageDraw或pytesseract创建的输出读取/绘制的方向……无论如何，我的问题是“为什么我的边界框被倒置？” 示例代码如下所示：从PIL导入图像，ImageDraw 导入pytesseract 从PyteSeract导入输出 input_image = Image.open('input_sample.jpg') tess_boxes = pytesseract.image_to_boxes(input_image,output

我想我遗漏了一些非常简单的概念，或者可能不理解PIL.ImageDraw或pytesseract创建的输出读取/绘制的方向……无论如何，我的问题是“为什么我的边界框被倒置？”

示例代码如下所示：从PIL导入图像，ImageDraw 导入pytesseract 从PyteSeract导入输出

input_image = Image.open('input_sample.jpg')
tess_boxes = pytesseract.image_to_boxes(input_image,output_type=Output.DICT)
draw = ImageDraw.Draw(input_image)

for idx,character in enumerate(tess_boxes['char']):

    #Get each point needed to draw the box
    left = tess_boxes['left'][idx]
    right = tess_boxes['right'][idx]
    bottom = tess_boxes['bottom'][idx]
    top = tess_boxes['top'][idx]

    #Re-arranging these seem to have no effect
    # y = (left,top)
    # x = (right,bottom)
    # runs the same as the following: 
    y = (right,bottom)
    x = (left,top)

    #Swapping x and y here has no visible effect
    draw.rectangle((x,y),fill=None,outline="#FF0000",width=3)

input_image.save('output_sample.png', "PNG")

输入图像

输出图像

Pyteseract和PIL在不同方向进行“扫描”，因此Y坐标不正确

正如才华横溢的jasonharper所建议的

在使用之前，只需从图像的高度减去每个Y值

代码已在以下位置进行了调整：

bottom = tess_boxes['bottom'][idx]
top = tess_boxes['top'][idx]

变成

bottom = h-tess_boxes['bottom'][idx]
top = h-tess_boxes['top'][idx]

其中“h”是图像的高度（w，h=输入值_image.size）

如果框环绕目标字符，则结果与预期一致

谢谢@jasonhaper

您也可以使用

图像来查看数据。你不需要做算术运算
导入pytesseract
#加载图像
img=cv2.imread（“cRPKk.jpg”）
#转换为灰度
gry=cv2.CVT颜色（img，cv2.COLOR\u BGR2GRAY）
#光学字符识别
d=pytesseract.image_to_数据（gry，output_type=pytesseract.output.DICT）
n_boxes=len（d[“级别]）
对于范围内的i（n_框）：
（x，y，w，h）=（d[‘左’][i]，d[‘顶’][i]，d[‘宽’][i]，d[‘高’][i]）
矩形（img，（x，y），（x+w，y+h），（0，0，255），2）
cv2.imshow（“img”，img）
cv2.等待键（0）

结果:
将Y坐标视为向上递增（这是数学中的惯例）的库与将Y坐标视为向下递增（这在计算机图形学中很常见，因为监视器总是向下扫描）的库之间似乎存在不匹配。我同意你的观点@jasonharper，我只是在如何修复这一问题上遇到了难题。我没有找到有关PIL或pytesseract扫描方向的信息。在这一点上，我将让PIL从0,0绘制到10,10，看看pytess本身是否有绘制能力来做同样的事情。我想这应该能直观地告诉我发生了什么。任何建议都很感激：）我的意思是…除了倒装，它工作得很好！Lol在使用图像之前，只需从图像高度中减去每个Y值。@jasonharper。。。你的朋友应该得到一块饼干。。。我对这个问题是如此的深谋远虑，我完全绕过了那个简单的逻辑。非常感谢。太棒了，谢谢你提供的额外方法。关于“图像到图像数据”与“图像到图像框”的使用，我是否可以假设“图像到图像数据”的读取方向与“图像框”选项不同？再次感谢您为解决此问题所做的更改