Python 从pytesseract释放内存_Python_Ocr_Python Tesseract

Python 从pytesseract释放内存

python

Python 从pytesseract释放内存,python,ocr,python-tesseract,Python,Ocr,Python Tesseract,我正在尝试简化一个OCR项目，我已经做了一段时间了。目前，我正在分析这个函数的内存使用情况，这是最重要的工作，我确定了我的代码段，我认为可以在减少内存方面进行改进下面是内存分析器的片段结果，我认为需要改进。最受关注的两条线是796和802——第796行检测图像的方向，如果第802行中的方向不是0，则会旋转。但是，稍后在第818/819行通过del img和gc.collect（）修正第802行的内存使用情况。不幸的是，我无法释放在第796行分配的内存，该行由pytesseract的img\u

我正在尝试简化一个OCR项目，我已经做了一段时间了。目前，我正在分析这个函数的内存使用情况，这是最重要的工作，我确定了我的代码段，我认为可以在减少内存方面进行改进

下面是内存分析器的片段结果，我认为需要改进。最受关注的两条线是796和802——第796行检测图像的方向，如果第802行中的方向不是0，则会旋转。但是，稍后在第818/819行通过

del img

和

gc.collect（）

修正第802行的内存使用情况。不幸的是，我无法释放在第796行分配的内存，该行由

pytesseract

的

img\u to\u osd

函数使用（正如我在第808/809行中尝试的那样，尽管没有释放内存）。在确定页面方向时，这个工具一直是我最准确的工具，但我想知道是否有一个内存占用较少的替代方法

Line #    Mem usage    Increment   Line Contents
================================================
   790    164.7 MiB    164.7 MiB   @profile
   791                             def extract(output_fmt, output_path, pdf_path, img, page_num):
   792
   793    164.7 MiB      0.0 MiB       prep_instance = imagePrep()
   794
   795                                 # use 'image_to_osd' to detect orientation--retrieve it using 'img_props['orientation']
   796    197.2 MiB     32.5 MiB       orientation = pytesseract.image_to_osd(img, output_type = pytesseract.Output.DICT)['orientation']
   797    197.2 MiB      0.0 MiB       print("\n\tPage orientation acquired.")
   798
   799                                 # 'output_name' will be the name of the file given by 'pdf_path' retrieved using string manipulation to get the file name
   800    197.2 MiB      0.0 MiB       output_name = pdf_path.rsplit("/", 1)[-1].split(".")[0] + "_page" + str(page_num + 1)
   801    197.2 MiB      0.0 MiB       if not (orientation == 0):
   802    229.2 MiB     32.0 MiB           img = img.rotate(orientation, expand = True)
   803    229.2 MiB      0.0 MiB           print("\n\tPage " + str(page_num + 1) + " rotated.")
   804    229.2 MiB      0.0 MiB           output_name_complete = output_name + "_rotated." + output_fmt
   805                                 else:
   806                                     output_name_complete = output_name + "." + output_fmt
   807
   808    229.2 MiB      0.0 MiB       del orientation
   809    229.2 MiB      0.0 MiB       gc.collect()
   810
   811                                 # create a folder that will store all images (cropped, items, vendor, etc.) for the page currently being worked on
   812    229.2 MiB      0.0 MiB       path_save_folder = create_directory(output_path, output_name)
   813    229.2 MiB      0.0 MiB       path_save = os.path.join(path_save_folder, output_name_complete)
   814
   815    229.3 MiB      0.0 MiB       img.save(path_save)
   816    229.3 MiB      0.0 MiB       print("\n\tPage " + str(page_num + 1) + " saved in '" + output_fmt + "' format.")
   817
   818    197.3 MiB      0.0 MiB       del img
   819    197.3 MiB      0.0 MiB       gc.collect()

此外，我了解到~30MB的内存使用增量并不是很大，但在某些情况下，我需要调整页面方向，以适应80幅以上的图像，这意味着总内存增量为2.4GB。我已经对56幅图像进行了测试，我的程序在第34幅图像上抛出了一个

Open CV

内存不足错误，因为它无法分配更多内存

有什么建议可以让人们推荐，以尽量减少RAM的使用或想法，为什么我不能释放808行内存如上所示