Python转换PDF_Python_Imagemagick - Fatal编程技术网

Python转换PDF

python imagemagick

Python转换PDF,python,imagemagick,Python,Imagemagick,我有以下代码从一个多页PDF创建多个JPG。但是我得到了以下错误：wand.exceptions.BlobError:无法打开映像“{uuid}.jpg”：没有这样的文件或目录@error/blob.c/OpenBlob/2841，但映像已经创建。我最初认为这可能是一种竞赛条件，所以我加入了time.sleep（），但这也不起作用，所以我不相信这是真的。以前有人见过这个吗 def split_pdf(pdf_obj, step_functions_client, task_token):

我有以下代码从一个多页PDF创建多个JPG。但是我得到了以下错误：

wand.exceptions.BlobError:无法打开映像“{uuid}.jpg”：没有这样的文件或目录@error/blob.c/OpenBlob/2841

，但映像已经创建。我最初认为这可能是一种竞赛条件，所以我加入了

time.sleep（）

，但这也不起作用，所以我不相信这是真的。以前有人见过这个吗

def split_pdf(pdf_obj, step_functions_client, task_token):
    print(time.time())

    read_pdf = PyPDF2.PdfFileReader(pdf_obj)
    images = []

    for page_num in range(read_pdf.numPages):
        output = PyPDF2.PdfFileWriter()
        output.addPage(read_pdf.getPage(page_num))

        generateduuid = str(uuid.uuid4())
        filename = generateduuid + ".pdf"
        outputfilename = generateduuid + ".jpg"
        with open(filename, "wb") as out_pdf:
            output.write(out_pdf) # write to local instead

        image = {"page": str(page_num + 1)}  # Start at 1 rather than 0

        create_image_process = subprocess.Popen(["gs", "-o " + outputfilename, "-sDEVICE=jpeg", "-r300", "-dJPEGQ=100", filename], stdout=subprocess.PIPE)
        create_image_process.wait()

        time.sleep(10)
        with(Image(filename=outputfilename)) as img:
            image["image_data"] = img.make_blob('jpeg')
            image["height"] = img.height
            image["width"] = img.width
            images.append(image)

            if hasattr(step_functions_client, 'send_task_heartbeat'):
                step_functions_client.send_task_heartbeat(taskToken=task_token)

    return images

当您试图首先打开PDF时，似乎没有传入值-因此您收到了错误

请确保使用完整的文件路径格式化字符串，例如

f'/path/to/file/{uuid}.jpg'

或

'/path/to/file/{}.jpg'.format（uuid）

我真的不明白为什么要使用PyPDF2、GhostScript和wand。您不需要解析/操作任何PostScript，Wand位于ImageMagick之上，ImageMagick位于ghostscript之上。您可以将函数缩减为一个PDF实用程序

def split\u pdf（pdf\u obj，步骤功能\u客户端，任务\u令牌）：
图像=[]
以图像（文件=pdf\U obj，分辨率=300）作为文档：
对于索引，枚举中的页面（document.sequence）：
图像={
“页面”：索引+1，
“高度”：page.height，
“宽度”：page.width，
}
以图像（页面）作为框架：
图像[“图像数据”]=帧。生成块（“JPEG”）
images.append（图像）
如果hasattr（步骤功能客户端，“发送任务”心跳）：
步骤\功能\客户端。发送\任务\心跳信号（任务令牌=任务令牌）
返回图像

我最初认为这可能是一种比赛状态，所以我加了一个time.sleep（），但那也不起作用，所以我不相信这是真的。以前有人见过这个吗

def split_pdf(pdf_obj, step_functions_client, task_token):
    print(time.time())

    read_pdf = PyPDF2.PdfFileReader(pdf_obj)
    images = []

    for page_num in range(read_pdf.numPages):
        output = PyPDF2.PdfFileWriter()
        output.addPage(read_pdf.getPage(page_num))

        generateduuid = str(uuid.uuid4())
        filename = generateduuid + ".pdf"
        outputfilename = generateduuid + ".jpg"
        with open(filename, "wb") as out_pdf:
            output.write(out_pdf) # write to local instead

        image = {"page": str(page_num + 1)}  # Start at 1 rather than 0

        create_image_process = subprocess.Popen(["gs", "-o " + outputfilename, "-sDEVICE=jpeg", "-r300", "-dJPEGQ=100", filename], stdout=subprocess.PIPE)
        create_image_process.wait()

        time.sleep(10)
        with(Image(filename=outputfilename)) as img:
            image["image_data"] = img.make_blob('jpeg')
            image["height"] = img.height
            image["width"] = img.width
            images.append(image)

            if hasattr(step_functions_client, 'send_task_heartbeat'):
                step_functions_client.send_task_heartbeat(taskToken=task_token)

    return images

示例代码没有任何错误处理。PDF可以由许多软件供应商生成，而且很多软件供应商的工作都很草率。PyPDF或Ghostscript很有可能失败，而您从未有机会处理此问题

例如，当我对随机网站生成的PDF使用Ghostscript时，我经常在

stderr

上看到以下消息

ignoring zlib error: incorrect data check

。。。这会导致文档不完整或空白页

另一个常见的例子是系统资源已耗尽，无法分配额外的内存。这种情况在web服务器上经常发生，解决方案通常是将任务迁移到一个队列工作者，该队列工作者可以在每个任务完成后完全关闭

感谢您的回答，但是脚本会创建所有必要的文件，并提供相应的输出。它将挂起，`with（Image（filename=outputfilename））作为img:`提供完整路径也没有帮助。