Python 索引器：使用Google Cloud Vision API的bytearray_Python_Arrays_Image Processing_Ocr_Google Cloud Vision

Python 索引器：使用Google Cloud Vision API的bytearray

python arrays image-processing

Python 索引器：使用Google Cloud Vision API的bytearray,python,arrays,image-processing,ocr,google-cloud-vision,Python,Arrays,Image Processing,Ocr,Google Cloud Vision,我正在尝试使用谷歌云视觉API来检测图像中的文本。我遵循了以下教程中的代码：完整代码如下： import argparse from enum import Enum import io from google.cloud import vision from PIL import Image, ImageDraw class FeatureType(Enum): PAGE = 1 BLOCK = 2 PARA = 3 WORD = 4 SYMBOL

我正在尝试使用谷歌云视觉API来检测图像中的文本。我遵循了以下教程中的代码：

完整代码如下：

import argparse
from enum import Enum
import io

from google.cloud import vision
from PIL import Image, ImageDraw

class FeatureType(Enum):
    PAGE = 1
    BLOCK = 2
    PARA = 3
    WORD = 4
    SYMBOL = 5


def draw_boxes(image, bounds, color):
    """Draw a border around the image using the hints in the vector list."""
    draw = ImageDraw.Draw(image)

    for bound in bounds:
        draw.polygon([
            bound.vertices[0].x, bound.vertices[0].y,
            bound.vertices[1].x, bound.vertices[1].y,
            bound.vertices[2].x, bound.vertices[2].y,
            bound.vertices[3].x, bound.vertices[3].y], None, color)
    return image


def get_document_bounds(image_file, feature):
    """Returns document bounds given an image."""
    client = vision.ImageAnnotatorClient()

    bounds = []

    with io.open(image_file, 'rb') as image_file:
        content = image_file.read()

    image = vision.Image(content=content)

    response = client.document_text_detection(image=image)
    document = response.full_text_annotation

    # Collect specified feature bounds by enumerating all document features
    for page in document.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    for symbol in word.symbols:
                        if (feature == FeatureType.SYMBOL):
                            bounds.append(symbol.bounding_box)

                    if (feature == FeatureType.WORD):
                        bounds.append(word.bounding_box)

                if (feature == FeatureType.PARA):
                    bounds.append(paragraph.bounding_box)

            if (feature == FeatureType.BLOCK):
                bounds.append(block.bounding_box)

    # The list `bounds` contains the coordinates of the bounding boxes.
    return bounds


def render_doc_text(filein, fileout):
    image = Image.open(filein)
    bounds = get_document_bounds(filein, FeatureType.BLOCK)
    draw_boxes(image, bounds, 'red')
    bounds = get_document_bounds(filein, FeatureType.PARA)
    draw_boxes(image, bounds, 'red')
    bounds = get_document_bounds(filein, FeatureType.WORD)
    draw_boxes(image, bounds, 'red')

    if fileout != 0:
        image.save(fileout)
    else:
        image.show()


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('detect_file', help='The image for text detection.')
    parser.add_argument('-out_file', help='Optional output file', default=0)
    args = parser.parse_args()

    render_doc_text(args.detect_file, args.out_file)

我将Windows 10与Python 3.7一起使用，并在命令提示符中使用以下代码：

C:\Users\ariel\Dropbox\Research\Mestizo\Code>python doctext.py censo_19940_tab_corta-30.png -out_file out.jpg

我得到了以下错误和回溯：

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\ImagePalette.py", line 99, in getcolor
    return self.colors[color]
KeyError: (255, 0, 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "doctext.py", line 96, in <module>
    render_doc_text(args.detect_file, args.out_file)
  File "doctext.py", line 78, in render_doc_text
    draw_boxes(image, bounds, 'red')
  File "doctext.py", line 35, in draw_boxes
    bound.vertices[3].x, bound.vertices[3].y], None, color)
  File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\ImageDraw.py", line 239, in polygon
    ink, fill = self._getink(outline, fill)
  File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\ImageDraw.py", line 113, in _getink
    ink = self.palette.getcolor(ink)
  File "C:\ProgramData\Anaconda3\lib\site-packages\PIL\ImagePalette.py", line 109, in getcolor
    self.palette[index + 256] = color[1]
IndexError: bytearray index out of range

回溯（最近一次呼叫最后一次）：
文件“C:\ProgramData\Anaconda3\lib\site packages\PIL\ImagePalette.py”，第99行，在getcolor中
返回自我。颜色[颜色]
KeyError:（255,0,0）
在处理上述异常期间，发生了另一个异常：
回溯（最近一次呼叫最后一次）：
文件“doctext.py”，第96行，在
呈现文档文本（args.detect\u文件、args.out\u文件）
文件“doctext.py”，第78行，在render\u doc\u text中
绘制框（图像、边界、“红色”）
文件“doctext.py”，第35行，在绘图框中
绑定。顶点[3]。x，绑定。顶点[3]。y]，无，颜色）
文件“C:\ProgramData\Anaconda3\lib\site packages\PIL\ImageDraw.py”，第239行，多边形
墨水，填充=self.\u获取墨水（轮廓，填充）
文件“C:\ProgramData\Anaconda3\lib\site packages\PIL\ImageDraw.py”，第113行，在\u getink中
ink=self.palette.getcolor（ink）
文件“C:\ProgramData\Anaconda3\lib\site packages\PIL\ImagePalette.py”，第109行，在getcolor中
self.palete[index+256]=颜色[1]
索引器：bytearray索引超出范围

我已经浏览了以前关于这个错误的帖子，但我不知道这是从哪里来的。

问题是，当我应该使用jpg时，我正在使用png文件。代码/教程的文档使用jpg文件作为输入：

转换为jpg后，代码运行没有问题，并产生了预期的输出。

据我所知，我想补充一点，问题的根本原因是代码预期的是rgb文件，而您的png文件是rgba，这导致了错误。Jpg文件始终是rgb，而png文件可以是rgb也可以不是。使用rgb png文件也可以做到这一点。此外，文本注释本身应该适用于所有jpg/png文件，只有PIL部分受错误影响。同样的错误，也是由于没有使用正确的RGB格式，可以在另一个链接到整洁的解释中看到。