Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/277.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何阅读波斯语pdf并删除其内容?_Python_Python 3.x_Pdf Scraping - Fatal编程技术网

Python 如何阅读波斯语pdf并删除其内容?

Python 如何阅读波斯语pdf并删除其内容?,python,python-3.x,pdf-scraping,Python,Python 3.x,Pdf Scraping,我试图阅读这个波斯语pdf,但结果没有解码好。我也尝试了utf-16或utf-32,但没有产生可读的结果。我想把波斯枣放在桌子里。 尝试了其他库,但没有提取出好的文本 urlpdf="https://www.codal.ir/Reports/DownloadFile.aspx?id=LG5QhAhMbfl2DrQQQaQQQ%2bkR9nMQ%3d%3d" response = requests.get(urlpdf, verify=False, timeout=5

我试图阅读这个波斯语pdf,但结果没有解码好。我也尝试了utf-16或utf-32,但没有产生可读的结果。我想把波斯枣放在桌子里。 尝试了其他库,但没有提取出好的文本

 urlpdf="https://www.codal.ir/Reports/DownloadFile.aspx?id=LG5QhAhMbfl2DrQQQaQQQ%2bkR9nMQ%3d%3d"
    response = requests.get(urlpdf, verify=False, timeout=5)
with io.BytesIO(response.content) as f:
    #print(response.content)
    pdf = PdfFileReader(f)
    #print(pdf)
    information = pdf.getDocumentInfo()
    number_of_pages = pdf.getNumPages()
    txt = f"""
    Author: {information.author}
    Creator: {information.creator}
    Producer: {information.producer}
    Subject: {information.subject}
    Title: {information.title}
    Number of pages: {number_of_pages}
    """
    # Here the metadata of your pdf
    print(txt)
    # numpage for the number page
    numpage=0
    page = pdf.getPage(numpage)
    page_content = page.extractText()+"\n"
    # print the content in the page 20 
    g=open("extract.txt",'w',encoding='UTF-8',)
    g.write(page_content)
    g.close
    print(page_content)

您还使用了哪些pdf刮削工具
pdfminer
camelot
,还是其他的?我试过pdfminer,但没有camelot,你能用这些库来做吗?