Python Pytesseract:“;TesseractNotFound错误:未安装tesseract或它';“它不在你的道路上”;,我该如何解决这个问题?
我正在尝试用python运行一个基本且非常简单的代码Python Pytesseract:“;TesseractNotFound错误:未安装tesseract或它';“它不在你的道路上”;,我该如何解决这个问题?,python,tesseract,Python,Tesseract,我正在尝试用python运行一个基本且非常简单的代码 from PIL import Image import pytesseract im = Image.open("sample1.jpg") text = pytesseract.image_to_string(im, lang = 'eng') print(text) 这就是它看起来的样子,我实际上已经通过安装程序安装了tesseract for windows。我对Python非常陌生,不知道如何继续 这里的任何指导都会很有帮助
from PIL import Image
import pytesseract
im = Image.open("sample1.jpg")
text = pytesseract.image_to_string(im, lang = 'eng')
print(text)
这就是它看起来的样子,我实际上已经通过安装程序安装了tesseract for windows。我对Python非常陌生,不知道如何继续
这里的任何指导都会很有帮助。我已尝试重新启动Spyder应用程序,但无效 您需要安装tesseract 查看上述安装文档。来自:
pytesseract.pytesseract.tesseract\u cmd=''
#如果路径中没有tesseract可执行文件,请包含上面的行
#示例tesseract\u cmd:'C:\\Program Files(x86)\\tesseract OCR\\tesseract'
首先,您应该安装二进制文件:
在Linux上
在Mac上
在窗户上
从下载二进制文件。然后将
pytesseract.pytesseract.tesseract\u cmd='C:\Program Files(x86)\tesseract OCR\tesseract.exe'
添加到脚本中
然后,您应该使用pip安装python包:
参考资料:
(安装组)及
使用以下命令安装tesseract
pip在windows中安装tesseract
:
pip install tesseract
pip install tesseract-ocr
并检查存储在系统中的文件usr/appdata/local/programs/site packages/python/python36/lib/pytesseract/pytesseract.py
file
并编译文件您可以安装此软件包。。。 之后,您应该转到以下路径C:\Program Files(x86)\Tesseract OCR\Tesseract.exe 然后运行tesseract文件。
我想这会对你有所帮助……我看到步骤分散在不同的答案中。根据我最近在Windows上遇到此PyteSeract错误的经验,按顺序编写不同的步骤,以便更轻松地解决此错误: 1。使用windows installer安装tesseract,该安装程序位于: 2。记下安装中的tesseract路径。进行此编辑时的默认安装路径为:
C:\Users\USER\AppData\Local\Tesseract OCR
。它可能会更改,因此请检查安装路径
3<代码>pip安装pytesseract
4。在调用image\u to\u string
之前,在脚本中设置tesseract路径:
pytesseract.pytesseract.tesseract\u cmd=r'C:\Users\USER\AppData\Local\tesseract OCR\tesseract.exe'
步骤1:
按照操作系统在系统上安装tesseract。
最新安装程序可在以下位置找到:
步骤2:
使用安装以下依赖项库:
pip安装pytesseract
pip安装opencvpython
pip安装numpy
步骤3:
示例代码
import cv2
import numpy as np
import pytesseract
from PIL import Image
from pytesseract import image_to_string
# Path of working folder on Disk Replace with your working folder
src_path = "C:\\Users\\<user>\\PycharmProjects\\ImageToText\\input\\"
# If you don't have tesseract executable in your PATH, include the
following:
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-
OCR/tesseract'
TESSDATA_PREFIX = 'C:/Program Files (x86)/Tesseract-OCR'
def get_string(img_path):
# Read image with opencv
img = cv2.imread(img_path)
# Convert to gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply dilation and erosion to remove some noise
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
# Write image after removed noise
cv2.imwrite(src_path + "removed_noise.png", img)
# Apply threshold to get image with only black and white
#img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
# Write the image after apply opencv to do some ...
cv2.imwrite(src_path + "thres.png", img)
# Recognize text with tesseract for python
result = pytesseract.image_to_string(Image.open(src_path + "thres.png"))
# Remove template file
#os.remove(temp)
return result
print('--- Start recognize text from image ---')
print(get_string(src_path + "image.png") )
print("------ Done -------")
导入cv2
将numpy作为np导入
导入pytesseract
从PIL导入图像
从PyteSeract导入图像到字符串
#磁盘上工作文件夹的路径替换为您的工作文件夹
src_path=“C:\\Users\\\\PycharmProjects\\ImageToText\\input\\”
#如果路径中没有tesseract可执行文件,请包括
以下:
pytesseract.pytesseract.tesseract_cmd='C:/程序文件(x86)/tesseract-
OCR/tesseract'
TESSDATA_前缀='C:/Program Files(x86)/Tesseract OCR'
def get_字符串(img_路径):
#用opencv读取图像
img=cv2.imread(img\u路径)
#变灰
img=cv2.cvt颜色(img,cv2.COLOR\u bgr2灰色)
#应用膨胀和腐蚀去除一些噪音
内核=np.ones((1,1),np.uint8)
img=cv2.deflate(img,内核,迭代次数=1)
img=cv2.腐蚀(img,内核,迭代次数=1)
#去除噪声后写入图像
imwrite(src_path+“removed_noise.png”,img)
#应用阈值以获得只有黑白的图像
#img=cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,31,2)
#在应用opencv后编写图像以执行一些。。。
imwrite(src_path+“thres.png”,img)
#使用tesseract for python识别文本
结果=pytesseract.image\u to\u字符串(image.open(src\u路径+“thres.png”))
#删除模板文件
#操作系统删除(临时)
返回结果
打印('---开始从图像中识别文本---')
打印(获取字符串(src_path+“image.png”))
打印(“----完成----”)
仅适用于Windows
1-您需要在计算机上安装Tesseract OCR
从这里得到它。
下载合适的版本
2-将Tesseract路径添加到系统环境中。i、 e.编辑系统变量
3-运行pip安装pytesseract
和pip安装tesseract
4-每次都将这一行添加到python脚本中
pytesseract.pytesseract.tesseract_cmd = 'C:/OCR/Tesseract-OCR/tesseract.exe' # your path may be different
5-运行代码。在windows中,对于默认的windows tesseract安装,必须重定向命令路径
在Mac上,您可以按如下所示安装它。这对我有用 如果您遇到如下错误:
tesseract is not installed or it's not in your path
and
OSError: [Errno 12] Cannot allocate memory
这可能是和交换内存分配问题有关的
您可以检查这个答案分配更多的交换内存,希望有帮助:)
在Windows 64位上,只需将以下内容添加到PATH环境变量:
“C:\Program Files\Tesseract OCR”
它会工作。我可以通过使用pytesseract.py文件中的bin/Tesseract路径更新Tesseract\u cmd变量来解决它。我在Windows上也遇到了同样的问题。
我试图更新tesseract路径的环境变量,但没有成功
对我有效的是修改pytesseract.py,它可以在路径C:\Program Files\Python37\Lib\site packages\pytesseract
中找到,或者通常在C:\Users\YOUR USER\APPDATA\Python
我更改了一行,如下所示:
#tesseract_cmd = 'tesseract'
#tesseract_cmd = 'C:\Program Files\Tesseract-OCR\\tesseract.exe'
注意:我必须在tesseract前面加上一个\
,因为Python解释的是与\t
相同的内容,您将得到以下错误消息:
pytesseract.pytesseract.tesseractnotfound错误:C:\Program Files\Tessera
import cv2
import numpy as np
import pytesseract
from PIL import Image
from pytesseract import image_to_string
# Path of working folder on Disk Replace with your working folder
src_path = "C:\\Users\\<user>\\PycharmProjects\\ImageToText\\input\\"
# If you don't have tesseract executable in your PATH, include the
following:
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-
OCR/tesseract'
TESSDATA_PREFIX = 'C:/Program Files (x86)/Tesseract-OCR'
def get_string(img_path):
# Read image with opencv
img = cv2.imread(img_path)
# Convert to gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply dilation and erosion to remove some noise
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
# Write image after removed noise
cv2.imwrite(src_path + "removed_noise.png", img)
# Apply threshold to get image with only black and white
#img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
# Write the image after apply opencv to do some ...
cv2.imwrite(src_path + "thres.png", img)
# Recognize text with tesseract for python
result = pytesseract.image_to_string(Image.open(src_path + "thres.png"))
# Remove template file
#os.remove(temp)
return result
print('--- Start recognize text from image ---')
print(get_string(src_path + "image.png") )
print("------ Done -------")
pytesseract.pytesseract.tesseract_cmd = 'C:/OCR/Tesseract-OCR/tesseract.exe' # your path may be different
pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files\Tesseract-OCR\tesseract.exe'
brew install tesseract
# {Windows 10 instructions}
# before you use the script you need to install the dependence
# 1. download the tesseract from the official link:
# https://github.com/UB-Mannheim/tesseract/wiki
# 2. install the tesseract
# i chosed this path
# *replace the user string in the below path with you name of user that you are using in your current machine
# C:\Users\user\AppData\Local\Tesseract-OCR\
# 3. Install the pillow for your python version
# * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by typing py -3.7):
# * if you are using another version of python first look how you start the python from you CMD
# * for some machine the run of python from the CMD is different
# [examples]
# =================================
# PYTHON VERSION 3.7
# python
# python3.7
# python -3.7
# python 3.7
# python3
# python -3
# python 3
# py3.7
# py -3.7
# py 3.7
# py3
# py -3
# py 3
# PYTHON VERSION 3.6
# python
# python3.6
# python -3.6
# python 3.6
# python3
# python -3
# python 3
# py3.6
# py -3.6
# py 3.6
# py3
# py -3
# py 3
# PYTHON VERSION 2.7
# python
# python2.7
# python -2.7
# python 2.7
# python2
# python -2
# python 2
# py2.7
# py -2.7
# py 2.7
# py2
# py -2
# py 2
# ================================
# we are using pip to install the dependences
# because for me i start the python version 3.7 with the following line
# py -3.7
# open the CMD in windows machine and type the following line:
# py -3.7 -m pip install pillow
# 4. Install the pytesseract and tesseract for your python version
# * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by typing py -3.7):
# we are using pip to install the dependences
# open the CMD in windows machine and type the following lines:
# py -3.7 -m pip install pytesseract
# py -3.7 -m pip install tesseract
#!/usr/bin/python
from PIL import Image
import pytesseract
import os
import getpass
def extract_text_from_image(image_file_name_arg):
# IMPORTANT
# if you have followed my instructions to install this dependence in above text explanatin
# for my machine is
# if you don't put the right path for tesseract.exe the script will not work
username = getpass.getuser()
# here above line get the username for your machine automatically
tesseract_exe_path_installation="C:\\Users\\"+username+"\\AppData\\Local\\Tesseract-OCR\\tesseract.exe"
pytesseract.pytesseract.tesseract_cmd=tesseract_exe_path_installation
# specify the direction of your image files manually or use line bellow if the images are in the script directory in folder images
# image_dir="D:\\GIT\\ai_example\\extract_text_from_image\\images"
image_dir=os.getcwd()+"\\images"
dir_seperator="\\"
image_file_name=image_file_name_arg
# if your image are in different format change the extension(ex. ".png")
image_ext=".jpg"
image_path_dir=image_dir+dir_seperator+image_file_name+image_ext
print("=============================================================================")
print("image used is in the following path dir:")
print("\t"+image_path_dir)
print("=============================================================================")
img=Image.open(image_path_dir)
text=pytesseract.image_to_string(img, lang="eng")
print(text)
# change the name "image_1" whith the name without extension for your image name
# image_file_name_arg="image_1"
image_file_name_arg="image_2"
# image_file_name_arg="image_3"
# image_file_name_arg="image_4"
# image_file_name_arg="image_5"
extract_text_from_image(image_file_name_arg)
# ==================================
# CREATED BY: SHERIFI
# e-mail: sherif_co@yahoo.com
# git-link for script: https://github.com/sherifi/ai_example.git
# ==================================
For Ubuntu 18.04
tesseract is not installed or it's not in your path
and
OSError: [Errno 12] Cannot allocate memory
#tesseract_cmd = 'tesseract'
#tesseract_cmd = 'C:\Program Files\Tesseract-OCR\\tesseract.exe'
sudo apt-get install tesseract-ocr
brew install tesseract
sudo apt-get install tesseract-ocr -y
sudo apt-get install tesseract-ocr-spa -y
tesseract --list-langs
List of available languages (3):
eng
osd
spa
pip install tesseract
pytesseract.pytesseract.tesseract_cmd = "C:\Program Files (x86)\Tesseract-OCR\\tesseract.exe"
conda install -c conda-forge tesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
img_text = pytesseract.image_to_string(Image.open(filename))
pytesseract.pytesseract.tesseract_cmd =r'C:/Program Files/Tesseract-OCR/tesseract.exe'
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev