Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用PDFminer避免PDF文件密码错误_Python_Python 3.x_Pdf_Try Except_Pdfminer - Fatal编程技术网

Python 如何使用PDFminer避免PDF文件密码错误

Python 如何使用PDFminer避免PDF文件密码错误,python,python-3.x,pdf,try-except,pdfminer,Python,Python 3.x,Pdf,Try Except,Pdfminer,我想从我的计算机收集所有PDF文件,并从每个文件中提取文本。我目前使用的两个函数都是这样做的,但是,一些PDF文件给了我以下错误: raise PDFPasswordIncorrect pdfminer.pdfdocument.PDFPasswordIncorrect 我在打开和读取PDF文件的函数中提出了错误,从忽略错误的角度来看,这似乎是可行的,但现在它忽略了所有PDF文件,包括以前没有问题的好文件 我怎样才能使它只忽略给我这个错误的PDF文件,而不是每个PDF文件 def pdfpar

我想从我的计算机收集所有PDF文件,并从每个文件中提取文本。我目前使用的两个函数都是这样做的,但是,一些PDF文件给了我以下错误:

raise PDFPasswordIncorrect 
pdfminer.pdfdocument.PDFPasswordIncorrect
我在打开和读取PDF文件的函数中提出了错误,从忽略错误的角度来看,这似乎是可行的,但现在它忽略了所有PDF文件,包括以前没有问题的好文件

我怎样才能使它只忽略给我这个错误的PDF文件,而不是每个PDF文件

def pdfparser(x):
    try:
        raise PDFPasswordIncorrect(pdfminer.pdfdocument.PDFPasswordIncorrect)
        fp = open(x, 'rb')
        rsrcmgr = PDFResourceManager()
        retstr = io.StringIO()
        codec = 'utf-8'
        laparams = LAParams()
        device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
        # Create a PDF interpreter object.
        interpreter = PDFPageInterpreter(rsrcmgr, device)
        # Process each page contained in the document.
    except (RuntimeError, TypeError, NameError,ValueError,IOError,IndexError,PermissionError):
         print("Error processing {}".format(name))

    for page in PDFPage.get_pages(fp):
        interpreter.process_page(page)
        data =  retstr.getvalue()

    return(data)

    def pdfs(files):
            for name in files:
                    try:
                        IP_list = (pdfparser(name))
                        keyword = re.findall(inp,IP_list)
                        file_dict['keyword'].append(keyword)
                        file_dict['name'].append(name.name[0:])
                        file_dict['created'].append(time.ctime(name.stat().st_ctime))
                        file_dict['modified'].append(time.ctime(name.stat().st_mtime))
                        file_dict['path'].append(name)
                        file_dict["content"].append(IP_list)
                    except (RuntimeError, TypeError, NameError,ValueError,IOError,IndexError,PermissionError):
                        print("Error processing {}".format(name))
                    #print(file_dict)
            return(file_dict)
    pdfs(files)
如果您没有提供正确的密码,打开了受密码保护的Pdf,为什么要手动引发错误

每次代码都会引发此错误

相反,如果发生错误,您需要捕获错误并跳过该文件。请参阅更正的代码:

def pdfparser(x):
    try: 
        # try to open your pdf here - do not raise the error yourself!
        # if it happens, catch and handle it as well

     except PDFPasswordIncorrect as e:      # catch PDFPasswordIncorrect
         print("Error processing {}: {}".format(name,e)) # with all other errors
         # no sense in doing anything if you got an error until here
         return None 


    # do something with your pdf and collect data
    data = []

    return(data)


    def pdfs(files):
        for name in files: 
            try:
                IP_list = pdfparser(name)

                if IP_list is None:             # unable to read for whatever reasons
                    continue                    # process next file

                # do stuff with your data if you got some                

            # most of these errors are already handled inside pdfparser
            except (RuntimeError, TypeError, NameError,ValueError,
                    IOError,IndexError,PermissionError):
                print("Error processing {}".format(name))

    return(file_dict)

    pdfs(files)
def pdfs(文件):
中的第二个
try/catch:
可以缩小,所有与文件相关的错误都发生在
def pdfparser(x):
中,并在那里处理。您的其余代码不完整,引用了我不知道的内容:


嗨,expect中的PDFPassword不正确,它给了我一个错误,说它是一个未定义的变量。我应该在某个地方定义它吗?@Cald no,请尝试
而不是pdfminer.pdfdocument.pdfpassword错误为e:
。它可能隐藏在名称空间中。我将pdfminer.pdfdocument.pdfpassword放在中不正确,除了在PDF函数而不是PDFparser函数中,它工作了!非常感谢你!
file_dict
inp
name # used as filehandle for .stat() but is a string etc