PythonPDF2image from link“；无法获取页面计数"；_Python_Pdf_Python Requests

PythonPDF2image from link“；无法获取页面计数"；

python pdf

PythonPDF2image from link“；无法获取页面计数"；,python,pdf,python-requests,Python,Pdf,Python Requests,我有一个PDF链接，我想转换成一个图像，所以我运行了这个 import requests import pdf2image x = "https://www.criticallink.com/wp-content/uploads/ISO-9001-2015-Certificate.pdf" pdf = requests.get(x,stream=True,timeout=30) images = pdf2image.convert_from_bytes(pdf.raw.rea

我有一个PDF链接，我想转换成一个图像，所以我运行了这个

import requests
import pdf2image
x = "https://www.criticallink.com/wp-content/uploads/ISO-9001-2015-Certificate.pdf"
pdf = requests.get(x,stream=True,timeout=30)
images = pdf2image.convert_from_bytes(pdf.raw.read())

但是我得到了这个错误

PDFPageCountError: Unable to get page count.
Syntax Warning: May not be a PDF file (continuing anyway)
Syntax Error (19): Illegal character '>'
Syntax Error (46): Illegal character ')'
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't read xref table

我该怎么办

更新：

pdf.raw.read()[:100]
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\x9c\xfdy@Rk\xfc>\x8a\xbe\x80\x8a\x9aC\x03\x15\x9aSY\n\xb5SI\xcaY\xd1\xb6\x13N\x80\xed\xdc\x91\x99i)X\x9a8U\x98\x8a\xd9\xb4\xd9\x84\x9a\x94F\x0e\x14\xa0\xb5\xcb\xac\x1d\xa6V\xa6\rH\xb5\xb7\xa2hZ6\x99\x9aJdj\xe2\x909\xdc\xe5\xfe}\xcf\xf9\xdd{\xcf\xf9\xe3\x9e\xbb\xfa#'

所以你得到了一个GZIP编码的响应。尝试以下方法

导入gzip
导入请求
导入PDF2图像
url=”https://www.criticallink.com/wp-content/uploads/ISO-9001-2015-Certificate.pdf"
response=requests.get（url，stream=True，timeout=30）
pdf=gzip.open（response.raw）
images=pdf2image.convert_from_字节（pdf.read（））

或者，您可以使用

导入请求
导入PDF2图像
url=”https://www.criticallink.com/wp-content/uploads/ISO-9001-2015-Certificate.pdf"
response=requests.get（url，超时=30）
images=pdf2image.convert_from_字节（response.content）

然后让

请求

为您解码。

打印出

pdf.raw.read（）

中的前100个字节，以便您/我们可以看到您得到的是什么。@JustinEzequiel

pdf.raw.read（）

输出

b''

这是因为您已经阅读了所有内容。在调用从_字节转换_之前进行打印。@JustinEzequiel已更新