Python 3.x 如何刮取这个pdf文件？_Python 3.x

Python 3.x 如何刮取这个pdf文件？

python-3.x

Python 3.x 如何刮取这个pdf文件？,python-3.x,Python 3.x,我想刮取这个波斯pdf文件的表，并将结果作为一个数据帧，但我得到错误“NameError:name'PDFResourceManager'未定义”，并且没有提取好的内容。请帮我找到一个真正的编码解决方案。包括您的测试代码是感激的 from pdfminer.converter import TextConverter from io import StringIO from io import open from urllib.request import urlopen import pdf

我想刮取这个波斯pdf文件的表，并将结果作为一个数据帧，但我得到错误“NameError:name'PDFResourceManager'未定义”，并且没有提取好的内容。请帮我找到一个真正的编码解决方案。包括您的测试代码是感激的

from pdfminer.converter import TextConverter
from io import StringIO
from io import open
from urllib.request import urlopen
import pdfminer as pm

urlpdf="https://www.codal.ir/Reports/DownloadFile.aspx?id=jck8NF9OtmFW6fpyefK09w%3d%3d"
response = requests.get(urlpdf, verify=False, timeout=5)
f=io.BytesIO(response.content)
def readPDF(f):
    rsrcmgr=PDFResourceManager()
    retstr=StringIO()
    laparams=LAParams()
    device=TextConverter(rsrcmgr,retstr,laparams=laparams)
    process_pdf(rsrcmgr,device,pdfFile)
    device.close()
    content=retstr.getvalue()
    retstr.close()
    return content
pdfFile=urlopen(urlpdf)
outputString=readPDF(pdfFile)

proceedings=outputString.encode('utf-8') # creates a UTF-8 byte object
proceedings=str(proceedings) # creates string representation <- the source of your issue
file=open("extract.txt","w", encoding="utf-8") # encodes str to platform specific encoding.
file.write(proceedings)
file.close()

从pdfminer.converter导入文本转换器
从io导入StringIO
从io导入打开
从urllib.request导入urlopen
将pdfminer作为pm导入
urlpdf=”https://www.codal.ir/Reports/DownloadFile.aspx?id=jck8NF9OtmFW6fpyefK09w%3d%3d"
response=requests.get（urlpdf，verify=False，timeout=5）
f=io.BytesIO（response.content）
def readPDF（f）：
rsrcmgr=PDFResourceManager（）
retstr=StringIO（）
laparams=laparams（）
device=TextConverter（rsrcmgr、retstr、laparams=laparams）
过程\u pdf（rsrcmgr、设备、pdfFile）
设备关闭（）
content=retstr.getvalue（）
retstr.close（）
返回内容
pdfFile=urlopen（urlpdf）
outputString=readPDF（PDF文件）
过程=outputString.encode（'utf-8'）#创建一个utf-8字节的对象
procedures=str（procedures）#创建字符串表示法您希望PDFResourceManager
来自哪里？正如错误消息所说，它没有在代码的任何地方定义。这也是我的问题。如果我知道，我就会解决我的问题。