无法在python中使用PyteSeract从tif图像提取文本_Python_Python 3.x_Python Imaging Library_Python Tesseract

无法在python中使用PyteSeract从tif图像提取文本

python python-3.x

无法在python中使用PyteSeract从tif图像提取文本,python,python-3.x,python-imaging-library,python-tesseract,Python,Python 3.x,Python Imaging Library,Python Tesseract,我无法使用Python中的pytesseract和PIL从.tif图像文件中提取文本。它适用于.png、.jpg图像文件，它只在.tif图像文件中给出错误。我使用的是Python 3.7.1版本在为.tif图像文件运行Python代码时，它给出了以下错误。请让我知道我做错了什么 Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding. Traceback (most recent call last):

我无法使用Python中的pytesseract和PIL从.tif图像文件中提取文本。它适用于.png、.jpg图像文件，它只在.tif图像文件中给出错误。我使用的是Python 3.7.1版本

在为.tif图像文件运行Python代码时，它给出了以下错误。请让我知道我做错了什么

Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding.
Traceback (most recent call last):
  File "C:/Users/u88ltuc/PycharmProjects/untitled1/Image Processing/Prog1.py", line 13, in <module>
    image_to_text = pytesseract.image_to_string(image, lang='eng')
  File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\pytesseract\pytesseract.py", line 347, in image_to_string
    }[output_type]()
  File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\pytesseract\pytesseract.py", line 346, in <lambda>
    Output.STRING: lambda: run_and_get_output(*args),
  File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\pytesseract\pytesseract.py", line 246, in run_and_get_output
    with save(image) as (temp_name, input_filename):
  File "C:\Program Files\Python37\lib\contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\pytesseract\pytesseract.py", line 171, in save
    image.save(input_file_name, format=extension, **image.info)
  File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\PIL\Image.py", line 2102, in save
    save_handler(self, fp, filename)
  File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\PIL\TiffImagePlugin.py", line 1626, in _save
    raise OSError("encoder error %d when writing image file" % s)
OSError: encoder error -2 when writing image file

以下是tif图像及其链接：

首先，您应该更改图像扩展名。这或许可以解决您的问题：

from PIL import Image
from io import BytesIO
import pytesseract

img = Image.open(r"C:\Users\u88ltuc\Desktop\12110845-e001.tif")
TempIO = BytesIO()
img.save(TempIO,format="JPEG")
img = Image.open(BytesIO(TempIO.getvalue()))

print(pytesseract.image_to_string(img))

或者，如果您不介意您的桌面上有两张相同的图片，您不需要导入BytesIO，这里是：

from PIL import Image
import pytesseract

img = Image.open(r"C:\Users\u88ltuc\Desktop\12110845-e001.tif")
img.save(r"C:\Users\u88ltuc\Desktop\12110845-e001.jpg")
img = Image.open(r"C:\Users\u88ltuc\Desktop\12110845-e001.jpg")

print(pytesseract.image_to_string(img))

我不确定这是不是同一个hello abhik我检查了你提供的git链接，但我不明白他们的解决方案，我认为这是不一样的。那么你为什么不更改图片的扩展名呢？只需使用

PIL

module就可以轻松做到这一点。@jizhaosama，你能帮助我使用Just PIL module从图像中提取文本吗，我不明白我怎么能做到这一点谢谢你们这么多的Jizhaosama，你们的代码工作得很好，我在过去3个月里一直在寻找解决方案。

from PIL import Image
import pytesseract

img = Image.open(r"C:\Users\u88ltuc\Desktop\12110845-e001.tif")
img.save(r"C:\Users\u88ltuc\Desktop\12110845-e001.jpg")
img = Image.open(r"C:\Users\u88ltuc\Desktop\12110845-e001.jpg")

print(pytesseract.image_to_string(img))