Python 3.x OSError:[Errno 12]无法分配内存pytesseract

Python 3.x OSError:[Errno 12]无法分配内存pytesseract,python-3.x,tesseract,python-tesseract,Python 3.x,Tesseract,Python Tesseract,我面临一个问题。我正在运行一个python脚本,它使用tesseract将pdf转换为图像 for filename in path_list: print(filename) pdfFile = wi(filename = filename, resolution = 300) image = pdfFile.convert('jpeg') imageBlobs = [] for img in image.sequence: imgPage =

我面临一个问题。我正在运行一个python脚本,它使用tesseract将pdf转换为图像

for filename in path_list:
    print(filename)        
    pdfFile = wi(filename = filename, resolution = 300)
    image = pdfFile.convert('jpeg')
imageBlobs = []

for img in image.sequence:
    imgPage = wi(image = img)
    imageBlobs.append(imgPage.make_blob('jpeg'))

extract = []

for imgBlob in imageBlobs:
    image = Image.open(io.BytesIO(imgBlob))
    text = pytesseract.image_to_string(image, lang = 'eng')
从11个PDF中提取内容后,我得到以下错误。 这不是pdf文件的问题,因为当我单独提供特定的pdf时,它会提取其内容。 我正在Ubuntu 16.04上运行脚本

任何帮助都将不胜感激

Error: -
File "/home/steve/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 170                                                                         ,in run_tesseract
proc = subprocess.Popen(cmd_args, **subprocess_args())
File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
    restore_signals, start_new_session)
File "/usr/lib/python3.5/subprocess.py", line 1490, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
Traceback (most recent call last):
  File "ocr_script.py", line 466, in <module>
  gather_details(path_list)
  File "ocr_script.py", line 45, in gather_details
  discover_data('Indexing',discoveryPath,final_meta,start_time)
  File "ocr_script.py", line 165, in discover_data
  text = pytesseract.image_to_string(image, lang='eng')
  File "/home/steve/.local/lib/python3.5/site 
  packages/pytesseract/pytesseract.py", line 294                                                                            
  , in image_to_string 
  return run_and_get_output(*args)
  File "/home/steve/.local/lib/python3.5/site- 
  packages/pytesseract/pytesseract.py", line 202                                                                            
  , in run_and_get_output
  run_tesseract(**kwargs)
  File "/home/steve/.local/lib/python3.5/site- 
  packages/pytesseract/pytesseract.py", line 172                                                                            
  , in run_tesseract
  raise TesseractNotFoundError()
  pytesseract.pytesseract.TesseractNotFoundError: /usr/bin/tesseract is not 
  installed or it's 
错误:-
文件“/home/steve/.local/lib/python3.5/site packages/pytesseract/pytesseract.py”,第170行,在run_tesseract中
proc=subprocess.Popen(cmd_args,**subprocess_args())
文件“/usr/lib/python3.5/subprocess.py”,第947行,在__
恢复信号,启动新会话)
文件“/usr/lib/python3.5/subprocess.py”,第1490行,在执行子进程中
恢复信号,启动新会话,preexec\u fn)
OSError:[Errno 12]无法分配内存
回溯(最近一次呼叫最后一次):
文件“ocr_script.py”,第466行,在
收集详细信息(路径列表)
文件“ocr_script.py”,第45行,聚集详细信息
发现数据(“索引”、发现路径、最终元、开始时间)
文件“ocr_script.py”,第165行,在discover_数据中
text=pytesseract.image\u to\u字符串(image,lang='eng')
文件“/home/steve/.local/lib/python3.5/site
包装/pytesseract/pytesseract.py”,第294行
,在图像\u到\u字符串中
返回运行和获取输出(*args)
文件“/home/steve/.local/lib/python3.5/site-
包装/pytesseract/pytesseract.py”,第202行
,在运行和获取输出中
运行_tesseract(**kwargs)
文件“/home/steve/.local/lib/python3.5/site-
包装/pytesseract/pytesseract.py”,第172行
,在运行中
引发TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError:/usr/bin/tesseract不正确
已安装或正在运行

经过进一步的分析和调整,我得出结论,问题出在我的tesseract上,而不是操作系统上。 我所做的改变-

  • /etc/ImageMagic..(版本) 编辑policy.xml文件

  • 这些是我增加内存的参数。

    这里可能回答的问题@Dmitri我在这里发现内存问题与Tessart有关