Python 在tesseract上使用图像字符串时出现问题
大家好,我有使用tesseract的python简单代码,但我认为这是与版本相关的问题,或者类似的问题,请看一下代码:Python 在tesseract上使用图像字符串时出现问题,python,python-3.x,tesseract,archlinux,Python,Python 3.x,Tesseract,Archlinux,大家好,我有使用tesseract的python简单代码,但我认为这是与版本相关的问题,或者类似的问题,请看一下代码: from PIL import Image import pytesseract file = '/home/gxs/Downloads/a.png' img = Image.open(file) text = pytesseract.image_to_string(Image.open(file)) 为此,我有以下输出(错误): TesserCharacterRor回溯(最
from PIL import Image
import pytesseract
file = '/home/gxs/Downloads/a.png'
img = Image.open(file)
text = pytesseract.image_to_string(Image.open(file))
为此,我有以下输出(错误):
TesserCharacterRor回溯(最后一次调用)
在里面
4 img=Image.open(文件)
5#显示(img)
---->6 text=pytesseract.image_到_字符串(image.open(文件))
映像\u到\u字符串中的~/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py(映像、lang、配置、nice、输出类型、超时)
368 args=[image,'txt',lang,config,nice,timeout]
369
-->370返回{
371 Output.BYTES:lambda:run_和_get_Output(*(args+[True]),
372 Output.DICT:lambda:{'text':运行和获取输出(*args)},
()
371 Output.BYTES:lambda:run_和_get_Output(*(args+[True]),
372 Output.DICT:lambda:{'text':运行和获取输出(*args)},
-->373 Output.STRING:lambda:run_和_get_Output(*args),
374}[输出类型]()
375
运行和获取输出中的~/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py(映像、扩展名、lang、配置、nice、超时、返回字节)
280 }
281
-->282运行时间(**kwargs)
283 filename=kwargs['output\u filename\u base']+extsep+扩展名
284打开(文件名为“rb”)作为输出文件:
运行\u tesseract时~/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py(输入\u文件名,输出\u文件名\u基,扩展名,lang,配置,nice,超时)
256,超时管理器(进程,超时)作为错误字符串:
257如果程序返回代码:
-->258 raise TESSERATERROR(proc.returncode,get_errors(error_string))
259
260
TesserActor:(-11,'Tesseract开源OCR Engine v3.03,带有Leptonica actual_TesserData_num_entries)。您可能缺少经过培训的模型的OCR数据。您尝试过安装它们吗?我已经安装了Tesseract data eng,我也尝试过全部安装
TesseractError Traceback (most recent call last)
<ipython-input-1-65b8cbea5fe0> in <module>
4 img = Image.open(file)
5 #display(img)
----> 6 text = pytesseract.image_to_string(Image.open(file))
~/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py in image_to_string(image, lang, config, nice, output_type, timeout)
368 args = [image, 'txt', lang, config, nice, timeout]
369
--> 370 return {
371 Output.BYTES: lambda: run_and_get_output(*(args + [True])),
372 Output.DICT: lambda: {'text': run_and_get_output(*args)},
~/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py in <lambda>()
371 Output.BYTES: lambda: run_and_get_output(*(args + [True])),
372 Output.DICT: lambda: {'text': run_and_get_output(*args)},
--> 373 Output.STRING: lambda: run_and_get_output(*args),
374 }[output_type]()
375
~/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
280 }
281
--> 282 run_tesseract(**kwargs)
283 filename = kwargs['output_filename_base'] + extsep + extension
284 with open(filename, 'rb') as output_file:
~/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
256 with timeout_manager(proc, timeout) as error_string:
257 if proc.returncode:
--> 258 raise TesseractError(proc.returncode, get_errors(error_string))
259
260
TesseractError: (-11, 'Tesseract Open Source OCR Engine v3.03 with Leptonica actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 53')