Python Pytesseract:“；TesseractNotFound错误：未安装tesseract或它'；“它不在你的道路上”；，我该如何解决这个问题？_Python_Tesseract

Python Pytesseract:“；TesseractNotFound错误：未安装tesseract或它'；“它不在你的道路上”；，我该如何解决这个问题？

python

Python Pytesseract:“；TesseractNotFound错误：未安装tesseract或它'；“它不在你的道路上”；，我该如何解决这个问题？,python,tesseract,Python,Tesseract,我正在尝试用python运行一个基本且非常简单的代码 from PIL import Image import pytesseract im = Image.open("sample1.jpg") text = pytesseract.image_to_string(im, lang = 'eng') print(text) 这就是它看起来的样子，我实际上已经通过安装程序安装了tesseract for windows。我对Python非常陌生，不知道如何继续这里的任何指导都会很有帮助

我正在尝试用python运行一个基本且非常简单的代码

from PIL import Image
import pytesseract

im = Image.open("sample1.jpg")

text = pytesseract.image_to_string(im, lang = 'eng')

print(text)

这就是它看起来的样子，我实际上已经通过安装程序安装了tesseract for windows。我对Python非常陌生，不知道如何继续

这里的任何指导都会很有帮助。我已尝试重新启动Spyder应用程序，但无效

您需要安装tesseract

查看上述安装文档。

来自：

pytesseract.pytesseract.tesseract\u cmd=''
#如果路径中没有tesseract可执行文件，请包含上面的行
#示例tesseract\u cmd:'C:\\Program Files（x86）\\tesseract OCR\\tesseract'

首先，您应该安装二进制文件：在Linux上在Mac上在窗户上

从下载二进制文件。然后将

pytesseract.pytesseract.tesseract\u cmd='C:\Program Files（x86）\tesseract OCR\tesseract.exe'

添加到脚本中

然后，您应该使用pip安装python包：参考资料：（安装组）及

使用以下命令安装tesseract

pip在windows中安装tesseract

：

pip install tesseract

pip install tesseract-ocr

并检查存储在系统中的文件

usr/appdata/local/programs/site packages/python/python36/lib/pytesseract/pytesseract.py

file

并编译文件

您可以安装此软件包。。。之后，您应该转到以下路径C:\Program Files（x86）\Tesseract OCR\Tesseract.exe 然后运行tesseract文件。

我想这会对你有所帮助……

我看到步骤分散在不同的答案中。根据我最近在Windows上遇到此PyteSeract错误的经验，按顺序编写不同的步骤，以便更轻松地解决此错误：

1。使用windows installer安装tesseract，该安装程序位于：

2。记下安装中的tesseract路径。进行此编辑时的默认安装路径为：

C:\Users\USER\AppData\Local\Tesseract OCR

。它可能会更改，因此请检查安装路径

3<代码>pip安装pytesseract

4。在调用

image\u to\u string

之前，在脚本中设置tesseract路径：

pytesseract.pytesseract.tesseract\u cmd=r'C:\Users\USER\AppData\Local\tesseract OCR\tesseract.exe'

步骤1：

按照操作系统在系统上安装tesseract。最新安装程序可在以下位置找到：

步骤2：使用安装以下依赖项库： pip安装pytesseract pip安装opencvpython pip安装numpy

步骤3：示例代码

import cv2
import numpy as np
import pytesseract
from PIL import Image
from pytesseract import image_to_string

# Path of working folder on Disk Replace with your working folder
src_path = "C:\\Users\\<user>\\PycharmProjects\\ImageToText\\input\\"
# If you don't have tesseract executable in your PATH, include the 
following:
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract- 
OCR/tesseract'
TESSDATA_PREFIX = 'C:/Program Files (x86)/Tesseract-OCR'

def get_string(img_path):
    # Read image with opencv
    img = cv2.imread(img_path)

    # Convert to gray
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Apply dilation and erosion to remove some noise
    kernel = np.ones((1, 1), np.uint8)
    img = cv2.dilate(img, kernel, iterations=1)
    img = cv2.erode(img, kernel, iterations=1)

    # Write image after removed noise
    cv2.imwrite(src_path + "removed_noise.png", img)

    #  Apply threshold to get image with only black and white
    #img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)

    # Write the image after apply opencv to do some ...
    cv2.imwrite(src_path + "thres.png", img)

    # Recognize text with tesseract for python
    result = pytesseract.image_to_string(Image.open(src_path + "thres.png"))

    # Remove template file
    #os.remove(temp)

    return result


print('--- Start recognize text from image ---')
print(get_string(src_path + "image.png") )

print("------ Done -------")

导入cv2
将numpy作为np导入
导入pytesseract
从PIL导入图像
从PyteSeract导入图像到字符串
#磁盘上工作文件夹的路径替换为您的工作文件夹
src_path=“C:\\Users\\\\PycharmProjects\\ImageToText\\input\\”
#如果路径中没有tesseract可执行文件，请包括
以下:
pytesseract.pytesseract.tesseract_cmd='C:/程序文件（x86）/tesseract-
OCR/tesseract'
TESSDATA_前缀='C:/Program Files（x86）/Tesseract OCR'
def get_字符串（img_路径）：
#用opencv读取图像
img=cv2.imread（img\u路径）
#变灰
img=cv2.cvt颜色（img，cv2.COLOR\u bgr2灰色）
#应用膨胀和腐蚀去除一些噪音
内核=np.ones（（1，1），np.uint8）
img=cv2.deflate（img，内核，迭代次数=1）
img=cv2.腐蚀（img，内核，迭代次数=1）
#去除噪声后写入图像
imwrite（src_path+“removed_noise.png”，img）
#应用阈值以获得只有黑白的图像
#img=cv2.adaptiveThreshold（img，255，cv2.ADAPTIVE_THRESH_GAUSSIAN_C，cv2.THRESH_BINARY，31，2）
#在应用opencv后编写图像以执行一些。。。
imwrite（src_path+“thres.png”，img）
#使用tesseract for python识别文本
结果=pytesseract.image\u to\u字符串（image.open（src\u路径+“thres.png”））
#删除模板文件
#操作系统删除（临时）
返回结果
打印（'---开始从图像中识别文本---'）
打印（获取字符串（src_path+“image.png”））
打印（“----完成----”）

仅适用于Windows 1-您需要在计算机上安装Tesseract OCR

从这里得到它。

下载合适的版本

2-将Tesseract路径添加到系统环境中。i、 e.编辑系统变量

3-运行

pip安装pytesseract

和

pip安装tesseract

4-每次都将这一行添加到python脚本中

pytesseract.pytesseract.tesseract_cmd = 'C:/OCR/Tesseract-OCR/tesseract.exe'  # your path may be different

5-运行代码。

在windows中，对于默认的windows tesseract安装，必须重定向命令路径

在32位系统中，在导入命令后添加此行

在64位系统中，改为添加此行

在Mac上，您可以按如下所示安装它。这对我有用

如果您遇到如下错误：

 tesseract is not installed or it's not in your path

 and 

 OSError: [Errno 12] Cannot allocate memory

这可能是和交换内存分配问题有关的

您可以检查这个答案分配更多的交换内存，希望有帮助：）

在Windows 64位上，只需将以下内容添加到PATH环境变量：

“C:\Program Files\Tesseract OCR”

它会工作。

我可以通过使用pytesseract.py文件中的bin/Tesseract路径更新Tesseract\u cmd变量来解决它。

我在Windows上也遇到了同样的问题。我试图更新tesseract路径的环境变量，但没有成功

对我有效的是修改pytesseract.py，它可以在路径

C:\Program Files\Python37\Lib\site packages\pytesseract

中找到，或者通常在

C:\Users\YOUR USER\APPDATA\Python

我更改了一行，如下所示：

#tesseract_cmd = 'tesseract' 
#tesseract_cmd = 'C:\Program Files\Tesseract-OCR\\tesseract.exe'

注意：我必须在tesseract前面加上一个

，因为Python解释的是与

\t

相同的内容，您将得到以下错误消息：

pytesseract.pytesseract.tesseractnotfound错误：C:\Program Files\Tessera

import cv2
import numpy as np
import pytesseract
from PIL import Image
from pytesseract import image_to_string

# Path of working folder on Disk Replace with your working folder
src_path = "C:\\Users\\<user>\\PycharmProjects\\ImageToText\\input\\"
# If you don't have tesseract executable in your PATH, include the 
following:
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract- 
OCR/tesseract'
TESSDATA_PREFIX = 'C:/Program Files (x86)/Tesseract-OCR'

def get_string(img_path):
    # Read image with opencv
    img = cv2.imread(img_path)

    # Convert to gray
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Apply dilation and erosion to remove some noise
    kernel = np.ones((1, 1), np.uint8)
    img = cv2.dilate(img, kernel, iterations=1)
    img = cv2.erode(img, kernel, iterations=1)

    # Write image after removed noise
    cv2.imwrite(src_path + "removed_noise.png", img)

    #  Apply threshold to get image with only black and white
    #img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)

    # Write the image after apply opencv to do some ...
    cv2.imwrite(src_path + "thres.png", img)

    # Recognize text with tesseract for python
    result = pytesseract.image_to_string(Image.open(src_path + "thres.png"))

    # Remove template file
    #os.remove(temp)

    return result


print('--- Start recognize text from image ---')
print(get_string(src_path + "image.png") )

print("------ Done -------")

pytesseract.pytesseract.tesseract_cmd = 'C:/OCR/Tesseract-OCR/tesseract.exe'  # your path may be different

pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

 pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files\Tesseract-OCR\tesseract.exe'

brew install tesseract

# {Windows 10 instructions}
# before you use the script you need to install the dependence
# 1. download the tesseract from the official link:
#   https://github.com/UB-Mannheim/tesseract/wiki
# 2. install the tesseract
#   i chosed this path
#       *replace the user string in the below path with you name of user that you are using in your current machine
#       C:\Users\user\AppData\Local\Tesseract-OCR\
# 3. Install the  pillow for your python version
# * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by     typing py -3.7):
# * if you are using another version of python first look how you start the python from you CMD
# * for some machine the run of python from the CMD is different
    # [examples]
    # =================================
    # PYTHON VERSION 3.7
    # python
    # python3.7
    # python -3.7
    # python 3.7
    # python3
    # python -3
    # python 3
    # py3.7
    # py -3.7
    # py 3.7
    # py3
    # py -3
    # py 3
    # PYTHON VERSION 3.6
    # python
    # python3.6
    # python -3.6
    # python 3.6
    # python3
    # python -3
    # python 3
    # py3.6
    # py -3.6
    # py 3.6
    # py3
    # py -3
    # py 3
    # PYTHON VERSION 2.7
    # python
    # python2.7
    # python -2.7
    # python 2.7
    # python2
    # python -2
    # python 2
    # py2.7
    # py -2.7
    # py 2.7
    # py2
    # py -2
    # py 2
    # ================================
# we are using pip to install the dependences
# because for me i start the python version 3.7 with the following line 
    # py -3.7
# open the CMD in windows machine and type the following line:
    # py -3.7 -m pip install pillow
# 4. Install the  pytesseract and tesseract for your python version
# * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by     typing py -3.7):
# we are using pip to install the dependences
# open the CMD in windows machine and type the following lines:
    # py -3.7 -m pip install pytesseract
    # py -3.7 -m pip install tesseract


#!/usr/bin/python
from PIL import Image
import pytesseract
import os
import getpass

def extract_text_from_image(image_file_name_arg):

    # IMPORTANT
    # if you have followed my instructions to install this dependence in above text explanatin
    # for my machine is
    # if you don't put the right path for tesseract.exe the script will not work
    username = getpass.getuser()
    # here above line get the username for your machine automatically
    tesseract_exe_path_installation="C:\\Users\\"+username+"\\AppData\\Local\\Tesseract-OCR\\tesseract.exe"
    pytesseract.pytesseract.tesseract_cmd=tesseract_exe_path_installation

# specify the direction of your image files manually or use line bellow if the images are in the script directory in     folder  images
    # image_dir="D:\\GIT\\ai_example\\extract_text_from_image\\images"
    image_dir=os.getcwd()+"\\images"
    dir_seperator="\\"
    image_file_name=image_file_name_arg
    # if your image are in different format change the extension(ex. ".png")
    image_ext=".jpg"
    image_path_dir=image_dir+dir_seperator+image_file_name+image_ext

    print("=============================================================================")
    print("image used is in the following path dir:")
    print("\t"+image_path_dir)
    print("=============================================================================")

    img=Image.open(image_path_dir)
    text=pytesseract.image_to_string(img, lang="eng")
    print(text)

# change the name "image_1" whith the name without extension for your image name
# image_file_name_arg="image_1"
image_file_name_arg="image_2"
# image_file_name_arg="image_3"
# image_file_name_arg="image_4"
# image_file_name_arg="image_5"
extract_text_from_image(image_file_name_arg)

# ==================================
# CREATED BY: SHERIFI
# e-mail: sherif_co@yahoo.com
# git-link for script: https://github.com/sherifi/ai_example.git
# ==================================

For Ubuntu 18.04

 tesseract is not installed or it's not in your path

 and 

 OSError: [Errno 12] Cannot allocate memory

#tesseract_cmd = 'tesseract' 
#tesseract_cmd = 'C:\Program Files\Tesseract-OCR\\tesseract.exe'

sudo apt-get install tesseract-ocr

brew install tesseract

sudo apt-get install tesseract-ocr -y
sudo apt-get install tesseract-ocr-spa -y
tesseract --list-langs

List of available languages (3):
eng
osd
spa

pip install tesseract

pytesseract.pytesseract.tesseract_cmd = "C:\Program Files (x86)\Tesseract-OCR\\tesseract.exe"

conda install -c conda-forge tesseract

 pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
 img_text = pytesseract.image_to_string(Image.open(filename))

pytesseract.pytesseract.tesseract_cmd =r'C:/Program Files/Tesseract-OCR/tesseract.exe'

sudo apt install tesseract-ocr
sudo apt install libtesseract-dev