Python Pytesseract:“;TesseractNotFound错误:未安装tesseract或它';“它不在你的道路上”;,我该如何解决这个问题?

Python Pytesseract:“;TesseractNotFound错误:未安装tesseract或它';“它不在你的道路上”;,我该如何解决这个问题?,python,tesseract,Python,Tesseract,我正在尝试用python运行一个基本且非常简单的代码 from PIL import Image import pytesseract im = Image.open("sample1.jpg") text = pytesseract.image_to_string(im, lang = 'eng') print(text) 这就是它看起来的样子,我实际上已经通过安装程序安装了tesseract for windows。我对Python非常陌生,不知道如何继续 这里的任何指导都会很有帮助

我正在尝试用python运行一个基本且非常简单的代码

from PIL import Image
import pytesseract

im = Image.open("sample1.jpg")

text = pytesseract.image_to_string(im, lang = 'eng')

print(text)
这就是它看起来的样子,我实际上已经通过安装程序安装了tesseract for windows。我对Python非常陌生,不知道如何继续


这里的任何指导都会很有帮助。我已尝试重新启动Spyder应用程序,但无效

您需要安装tesseract

查看上述安装文档。

来自:


pytesseract.pytesseract.tesseract\u cmd=''
#如果路径中没有tesseract可执行文件,请包含上面的行
#示例tesseract\u cmd:'C:\\Program Files(x86)\\tesseract OCR\\tesseract'
首先,您应该安装二进制文件: 在Linux上 在Mac上 在窗户上
从下载二进制文件。然后将
pytesseract.pytesseract.tesseract\u cmd='C:\Program Files(x86)\tesseract OCR\tesseract.exe'
添加到脚本中

然后,您应该使用pip安装python包: 参考资料: (安装组)及

使用以下命令安装tesseract

pip在windows中安装tesseract

pip install tesseract

pip install tesseract-ocr
并检查存储在系统中的文件
usr/appdata/local/programs/site packages/python/python36/lib/pytesseract/pytesseract.py
file
并编译文件

您可以安装此软件包。。。 之后,您应该转到以下路径C:\Program Files(x86)\Tesseract OCR\Tesseract.exe 然后运行tesseract文件。
我想这会对你有所帮助……

我看到步骤分散在不同的答案中。根据我最近在Windows上遇到此PyteSeract错误的经验,按顺序编写不同的步骤,以便更轻松地解决此错误:

1。使用windows installer安装tesseract,该安装程序位于:

2。记下安装中的tesseract路径。进行此编辑时的默认安装路径为:
C:\Users\USER\AppData\Local\Tesseract OCR
。它可能会更改,因此请检查安装路径

3<代码>pip安装pytesseract

4。在调用
image\u to\u string
之前,在脚本中设置tesseract路径:


pytesseract.pytesseract.tesseract\u cmd=r'C:\Users\USER\AppData\Local\tesseract OCR\tesseract.exe'
步骤1:

按照操作系统在系统上安装tesseract。 最新安装程序可在以下位置找到:

步骤2: 使用安装以下依赖项库: pip安装pytesseract pip安装opencvpython pip安装numpy

步骤3: 示例代码

import cv2
import numpy as np
import pytesseract
from PIL import Image
from pytesseract import image_to_string

# Path of working folder on Disk Replace with your working folder
src_path = "C:\\Users\\<user>\\PycharmProjects\\ImageToText\\input\\"
# If you don't have tesseract executable in your PATH, include the 
following:
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract- 
OCR/tesseract'
TESSDATA_PREFIX = 'C:/Program Files (x86)/Tesseract-OCR'

def get_string(img_path):
    # Read image with opencv
    img = cv2.imread(img_path)

    # Convert to gray
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Apply dilation and erosion to remove some noise
    kernel = np.ones((1, 1), np.uint8)
    img = cv2.dilate(img, kernel, iterations=1)
    img = cv2.erode(img, kernel, iterations=1)

    # Write image after removed noise
    cv2.imwrite(src_path + "removed_noise.png", img)

    #  Apply threshold to get image with only black and white
    #img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)

    # Write the image after apply opencv to do some ...
    cv2.imwrite(src_path + "thres.png", img)

    # Recognize text with tesseract for python
    result = pytesseract.image_to_string(Image.open(src_path + "thres.png"))

    # Remove template file
    #os.remove(temp)

    return result


print('--- Start recognize text from image ---')
print(get_string(src_path + "image.png") )

print("------ Done -------")
导入cv2
将numpy作为np导入
导入pytesseract
从PIL导入图像
从PyteSeract导入图像到字符串
#磁盘上工作文件夹的路径替换为您的工作文件夹
src_path=“C:\\Users\\\\PycharmProjects\\ImageToText\\input\\”
#如果路径中没有tesseract可执行文件,请包括
以下:

pytesseract.pytesseract.tesseract_cmd='C:/程序文件(x86)/tesseract- OCR/tesseract' TESSDATA_前缀='C:/Program Files(x86)/Tesseract OCR' def get_字符串(img_路径): #用opencv读取图像 img=cv2.imread(img\u路径) #变灰 img=cv2.cvt颜色(img,cv2.COLOR\u bgr2灰色) #应用膨胀和腐蚀去除一些噪音 内核=np.ones((1,1),np.uint8) img=cv2.deflate(img,内核,迭代次数=1) img=cv2.腐蚀(img,内核,迭代次数=1) #去除噪声后写入图像 imwrite(src_path+“removed_noise.png”,img) #应用阈值以获得只有黑白的图像 #img=cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,31,2) #在应用opencv后编写图像以执行一些。。。 imwrite(src_path+“thres.png”,img) #使用tesseract for python识别文本 结果=pytesseract.image\u to\u字符串(image.open(src\u路径+“thres.png”)) #删除模板文件 #操作系统删除(临时) 返回结果 打印('---开始从图像中识别文本---') 打印(获取字符串(src_path+“image.png”)) 打印(“----完成----”)
仅适用于Windows 1-您需要在计算机上安装Tesseract OCR

从这里得到它。

下载合适的版本

2-将Tesseract路径添加到系统环境中。i、 e.编辑系统变量

3-运行
pip安装pytesseract
pip安装tesseract

4-每次都将这一行添加到python脚本中

pytesseract.pytesseract.tesseract_cmd = 'C:/OCR/Tesseract-OCR/tesseract.exe'  # your path may be different

5-运行代码。

在windows中,对于默认的windows tesseract安装,必须重定向命令路径

  • 在32位系统中,在导入命令后添加此行
  • 在64位系统中,改为添加此行

  • 在Mac上,您可以按如下所示安装它。这对我有用

    如果您遇到如下错误:

     tesseract is not installed or it's not in your path
    
     and 
    
     OSError: [Errno 12] Cannot allocate memory
    
    这可能是和交换内存分配问题有关的

    您可以检查这个答案分配更多的交换内存,希望有帮助:)


    在Windows 64位上,只需将以下内容添加到PATH环境变量:
    “C:\Program Files\Tesseract OCR”
    它会工作。

    我可以通过使用pytesseract.py文件中的bin/Tesseract路径更新Tesseract\u cmd变量来解决它。

    我在Windows上也遇到了同样的问题。 我试图更新tesseract路径的环境变量,但没有成功

    对我有效的是修改pytesseract.py,它可以在路径
    C:\Program Files\Python37\Lib\site packages\pytesseract
    中找到,或者通常在
    C:\Users\YOUR USER\APPDATA\Python

    我更改了一行,如下所示:

    #tesseract_cmd = 'tesseract' 
    #tesseract_cmd = 'C:\Program Files\Tesseract-OCR\\tesseract.exe'
    
    注意:我必须在tesseract前面加上一个
    \
    ,因为Python解释的是与
    \t
    相同的内容,您将得到以下错误消息:

    pytesseract.pytesseract.tesseractnotfound错误:C:\Program Files\Tessera
    import cv2
    import numpy as np
    import pytesseract
    from PIL import Image
    from pytesseract import image_to_string
    
    # Path of working folder on Disk Replace with your working folder
    src_path = "C:\\Users\\<user>\\PycharmProjects\\ImageToText\\input\\"
    # If you don't have tesseract executable in your PATH, include the 
    following:
    pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract- 
    OCR/tesseract'
    TESSDATA_PREFIX = 'C:/Program Files (x86)/Tesseract-OCR'
    
    def get_string(img_path):
        # Read image with opencv
        img = cv2.imread(img_path)
    
        # Convert to gray
        img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
        # Apply dilation and erosion to remove some noise
        kernel = np.ones((1, 1), np.uint8)
        img = cv2.dilate(img, kernel, iterations=1)
        img = cv2.erode(img, kernel, iterations=1)
    
        # Write image after removed noise
        cv2.imwrite(src_path + "removed_noise.png", img)
    
        #  Apply threshold to get image with only black and white
        #img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
    
        # Write the image after apply opencv to do some ...
        cv2.imwrite(src_path + "thres.png", img)
    
        # Recognize text with tesseract for python
        result = pytesseract.image_to_string(Image.open(src_path + "thres.png"))
    
        # Remove template file
        #os.remove(temp)
    
        return result
    
    
    print('--- Start recognize text from image ---')
    print(get_string(src_path + "image.png") )
    
    print("------ Done -------")
    
    pytesseract.pytesseract.tesseract_cmd = 'C:/OCR/Tesseract-OCR/tesseract.exe'  # your path may be different
    
    pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe' 
    
     pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files\Tesseract-OCR\tesseract.exe'
    
    brew install tesseract
    
    # {Windows 10 instructions}
    # before you use the script you need to install the dependence
    # 1. download the tesseract from the official link:
    #   https://github.com/UB-Mannheim/tesseract/wiki
    # 2. install the tesseract
    #   i chosed this path
    #       *replace the user string in the below path with you name of user that you are using in your current machine
    #       C:\Users\user\AppData\Local\Tesseract-OCR\
    # 3. Install the  pillow for your python version
    # * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by     typing py -3.7):
    # * if you are using another version of python first look how you start the python from you CMD
    # * for some machine the run of python from the CMD is different
        # [examples]
        # =================================
        # PYTHON VERSION 3.7
        # python
        # python3.7
        # python -3.7
        # python 3.7
        # python3
        # python -3
        # python 3
        # py3.7
        # py -3.7
        # py 3.7
        # py3
        # py -3
        # py 3
        # PYTHON VERSION 3.6
        # python
        # python3.6
        # python -3.6
        # python 3.6
        # python3
        # python -3
        # python 3
        # py3.6
        # py -3.6
        # py 3.6
        # py3
        # py -3
        # py 3
        # PYTHON VERSION 2.7
        # python
        # python2.7
        # python -2.7
        # python 2.7
        # python2
        # python -2
        # python 2
        # py2.7
        # py -2.7
        # py 2.7
        # py2
        # py -2
        # py 2
        # ================================
    # we are using pip to install the dependences
    # because for me i start the python version 3.7 with the following line 
        # py -3.7
    # open the CMD in windows machine and type the following line:
        # py -3.7 -m pip install pillow
    # 4. Install the  pytesseract and tesseract for your python version
    # * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by     typing py -3.7):
    # we are using pip to install the dependences
    # open the CMD in windows machine and type the following lines:
        # py -3.7 -m pip install pytesseract
        # py -3.7 -m pip install tesseract
    
    
    #!/usr/bin/python
    from PIL import Image
    import pytesseract
    import os
    import getpass
    
    def extract_text_from_image(image_file_name_arg):
    
        # IMPORTANT
        # if you have followed my instructions to install this dependence in above text explanatin
        # for my machine is
        # if you don't put the right path for tesseract.exe the script will not work
        username = getpass.getuser()
        # here above line get the username for your machine automatically
        tesseract_exe_path_installation="C:\\Users\\"+username+"\\AppData\\Local\\Tesseract-OCR\\tesseract.exe"
        pytesseract.pytesseract.tesseract_cmd=tesseract_exe_path_installation
    
    # specify the direction of your image files manually or use line bellow if the images are in the script directory in     folder  images
        # image_dir="D:\\GIT\\ai_example\\extract_text_from_image\\images"
        image_dir=os.getcwd()+"\\images"
        dir_seperator="\\"
        image_file_name=image_file_name_arg
        # if your image are in different format change the extension(ex. ".png")
        image_ext=".jpg"
        image_path_dir=image_dir+dir_seperator+image_file_name+image_ext
    
        print("=============================================================================")
        print("image used is in the following path dir:")
        print("\t"+image_path_dir)
        print("=============================================================================")
    
        img=Image.open(image_path_dir)
        text=pytesseract.image_to_string(img, lang="eng")
        print(text)
    
    # change the name "image_1" whith the name without extension for your image name
    # image_file_name_arg="image_1"
    image_file_name_arg="image_2"
    # image_file_name_arg="image_3"
    # image_file_name_arg="image_4"
    # image_file_name_arg="image_5"
    extract_text_from_image(image_file_name_arg)
    
    # ==================================
    # CREATED BY: SHERIFI
    # e-mail: sherif_co@yahoo.com
    # git-link for script: https://github.com/sherifi/ai_example.git
    # ==================================
    
    For Ubuntu 18.04
    
     tesseract is not installed or it's not in your path
    
     and 
    
     OSError: [Errno 12] Cannot allocate memory
    
    #tesseract_cmd = 'tesseract' 
    #tesseract_cmd = 'C:\Program Files\Tesseract-OCR\\tesseract.exe'
    
    sudo apt-get install tesseract-ocr
    
    brew install tesseract
    
    sudo apt-get install tesseract-ocr -y
    sudo apt-get install tesseract-ocr-spa -y
    tesseract --list-langs
    
    List of available languages (3):
    eng
    osd
    spa
    
    pip install tesseract
    
    pytesseract.pytesseract.tesseract_cmd = "C:\Program Files (x86)\Tesseract-OCR\\tesseract.exe" 
    
    conda install -c conda-forge tesseract
    
     pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
     img_text = pytesseract.image_to_string(Image.open(filename))
    
    pytesseract.pytesseract.tesseract_cmd =r'C:/Program Files/Tesseract-OCR/tesseract.exe'
    
    sudo apt install tesseract-ocr
    sudo apt install libtesseract-dev