Python 使用pdfminer，代码会被卡在命令解释程序.process_页面（第页），并且永远不会终止或抛出错误_Python_Python 3.x_Timer_Pdfminer

Python 使用pdfminer，代码会被卡在命令解释程序.process_页面（第页），并且永远不会终止或抛出错误

python python-3.x timer

Python 使用pdfminer，代码会被卡在命令解释程序.process_页面（第页），并且永远不会终止或抛出错误,python,python-3.x,timer,pdfminer,Python,Python 3.x,Timer,Pdfminer,我在pdfminer中的PDFpage解释器上遇到了一些问题。到目前为止，以下代码对我所看到的每个pdf文件都有效，但我最近发现，当面对一个包含大量文本的pdf页面（如大小为3pt字体的压缩数据表）时，我的代码将卡在下面的行中，既不会继续，也不会抛出错误： interpreter.process_page(page) 代码：一旦我得到布局对象，我就可以回家了，但是没有先运行解释器，device.get_result（）返回的布局树几乎是空的。有人知道有没有办法让解释器在这些信息密度极高的页面

我在pdfminer中的PDFpage解释器上遇到了一些问题。到目前为止，以下代码对我所看到的每个pdf文件都有效，但我最近发现，当面对一个包含大量文本的pdf页面（如大小为3pt字体的压缩数据表）时，我的代码将卡在下面的行中，既不会继续，也不会抛出错误：

interpreter.process_page(page)

代码：

一旦我得到布局对象，我就可以回家了，但是没有先运行解释器，device.get_result（）返回的布局树几乎是空的。有人知道有没有办法让解释器在这些信息密度极高的页面上运行？这将是我的理想解决方案，但如果不可能，是否有人知道如何在函数上设置计时器，以便如果它卡在那行代码上，它将继续？我尝试过使用下面的代码，但它最终生成了大量无法正确连接的子流程

import multiprocessing
import time

# Ok now that we have everything to process a pdf document, lets
# process it page by page
for page in PDFPage.create_pages(document):
    # As the interpreter processes the page stored in PDFDocument
    # object
    p = multiprocessing.Process(target = interpreter.process_page,
                                args= (page,))
    p.start()
    # Wait for 10 seconds or until process finishes
    p.join(10)

    # Give up on the page if it took longer than 10 seconds to
    # interpret.
    if p.is_alive():
        p.terminate()
        p.join()
        time.sleep(1)
        continue

    # The device renders the layout from interpreter
    layout = device.get_result()

你找到解决办法了吗？

import multiprocessing
import time

# Ok now that we have everything to process a pdf document, lets
# process it page by page
for page in PDFPage.create_pages(document):
    # As the interpreter processes the page stored in PDFDocument
    # object
    p = multiprocessing.Process(target = interpreter.process_page,
                                args= (page,))
    p.start()
    # Wait for 10 seconds or until process finishes
    p.join(10)

    # Give up on the page if it took longer than 10 seconds to
    # interpret.
    if p.is_alive():
        p.terminate()
        p.join()
        time.sleep(1)
        continue

    # The device renders the layout from interpreter
    layout = device.get_result()