为什么在PythonPdfMiner中收到此错误：TypeError:只能将str（而不是“bytes”）连接到str_Python_Python 3.x_Pdf_Pdfminer

为什么在PythonPdfMiner中收到此错误：TypeError:只能将str（而不是“bytes”）连接到str

python python-3.x pdf

为什么在PythonPdfMiner中收到此错误：TypeError:只能将str（而不是“bytes”）连接到str,python,python-3.x,pdf,pdfminer,Python,Python 3.x,Pdf,Pdfminer,我是python新手，尝试使用PDFminer将pdf转换为txt文件，每次TypeError:只能将str（而不是“字节”）连接到str*- 我非常困惑，因为错误消息似乎表明错误是由于pdfminer包中的文件造成的？我知道这里还有其他关于此错误消息的问题，但我无法根据这些问题找出我的问题-可能主要是因为我不知道他们的代码在做什么，我是一个初学者，但也可能是因为我的问题似乎是由于与PDFminer相关的文件造成的我正在运行以下代码： from pdfminer.layout import L

我是python新手，尝试使用PDFminer将pdf转换为txt文件，每次

TypeError:只能将str（而不是“字节”）连接到str*-

我非常困惑，因为错误消息似乎表明错误是由于

pdfminer

包中的文件造成的？我知道这里还有其他关于此错误消息的问题，但我无法根据这些问题找出我的问题-可能主要是因为我不知道他们的代码在做什么，我是一个初学者，但也可能是因为我的问题似乎是由于与

PDFminer

相关的文件造成的

我正在运行以下代码：

from pdfminer.layout import LAParams
from pdfminer.converter import TextConverter
from io import StringIO
from pdfminer.pdfpage import PDFPage

def get_pdf_file_content(path_to_pdf):
    resource_manager = PDFResourceManager(caching=True)
    out_text = StringIO
    laParams = LAParams()
    text_converter = TextConverter(resource_manager, out_text, laparams= laParams)
    fp = open(path_to_pdf, 'rb')
    interpreter = PDFPageInterpreter(resource_manager, text_converter)
    for page in PDFPage.get_pages(fp, pagenos=set(), maxpages=0, password="", caching= True, check_extractable= True):
        interpreter.process_page(page)

    text = out_text.getvalue()

    fp.close()
    text_converter.close()
    out_text.close()

    return text

path_to_pdf = "C:\\files\\raw\\AZO - CALLSTREET REPORT  AutoZone, Inc.(AZO), Q1 2002 Earnings Call, 5-December-2001 10 00 AM ET - 05-Dec-01.pdf"
print(get_pdf_file_content(path_to_pdf))

我收到此错误消息：

  File "<stdin>", line 1, in <module>
  File "<stdin>", line 8, in get_pdf_file_content
  File "C:\text_analysis\project\lib\site-packages\pdfminer\pdfpage.py", line 122, in get_pages
    doc = PDFDocument(parser, password=password, caching=caching)
  File "C:\text_analysis\project\lib\site-packages\pdfminer\pdfdocument.py", line 575, in __init__
    self._initialize_password(password)
  File "C:\text_analysis\project\lib\site-packages\pdfminer\pdfdocument.py", line 599, in _initialize_password
    handler = factory(docid, param, password)
  File "C:\text_analysis\project\lib\site-packages\pdfminer\pdfdocument.py", line 300, in __init__
    self.init()
  File "C:\text_analysis\project\lib\site-packages\pdfminer\pdfdocument.py", line 307, in init
    self.init_key()
  File "C:\text_analysis\project\lib\site-packages\pdfminer\pdfdocument.py", line 320, in init_key
    self.key = self.authenticate(self.password)
  File "C:\text_analysis\project\lib\site-packages\pdfminer\pdfdocument.py", line 368, in authenticate
    key = self.authenticate_user_password(password)
  File "C:\text_analysis\project\lib\site-packages\pdfminer\pdfdocument.py", line 374, in authenticate_user_password
    key = self.compute_encryption_key(password)
  File "C:\text_analysis\project\lib\site-packages\pdfminer\pdfdocument.py", line 351, in compute_encryption_key
    password = (password + self.PASSWORD_PADDING)[:32]  # 1
TypeError: can only concatenate str (not "bytes") to str```

文件“”，第1行，在
文件“”，第8行，在获取pdf文件内容中
文件“C:\text\u analysis\project\lib\site packages\pdfminer\pdfpage.py”，第122行，在get\u页面中
doc=PDFDocument（解析器，密码=密码，缓存=缓存）
文件“C:\text\u analysis\project\lib\site packages\pdfminer\pdfdocument.py”，第575行，在\uu init中__
自我初始化密码（密码）
文件“C:\text\u analysis\project\lib\site packages\pdfminer\pdfdocument.py”，第599行，在密码中
handler=工厂（docid、param、密码）
文件“C:\text\u analysis\project\lib\site packages\pdfminer\pdfdocument.py”，第300行，在\uu init中__
self.init（）
文件“C:\text\u analysis\project\lib\site packages\pdfminer\pdfdocument.py”，第307行，在init中
self.init_key（）
文件“C:\text\u analysis\project\lib\site packages\pdfminer\pdfdocument.py”，第320行，在init\u键中
self.key=self.authenticate（self.password）
文件“C:\text\u analysis\project\lib\site packages\pdfminer\pdfdocument.py”，第368行，在“身份验证”中
key=self.authenticate\u user\u password（密码）
文件“C:\text\u analysis\project\lib\site packages\pdfminer\pdfdocument.py”，第374行，在authenticate\u user\u password中
key=self.compute\u encryption\u key（密码）
文件“C:\text\u analysis\project\lib\site packages\pdfminer\pdfdocument.py”，第351行，位于计算加密密钥中
密码=（密码+自身密码填充）[:32]#1
TypeError:只能将str（而不是“字节”）连接到str```

这里有两种选择：

1）您可以将密码设置为字节，从而以（注意定义密码的引号前的b）

2）你可以摆脱那场争论 password参数不是必需的（它有一个默认值），因此，如果您不特别需要它，可以将其删除。您最终将得到：

for page in PDFPage.get_pages(fp, pagenos=set(), maxpages=0, caching= True, check_extractable= True):
        interpreter.process_page(page)

我以前有过这个问题。我将密码设置为字节，将传递给解析器的数据设置为字节，它可以为我将多个PDF转换为多个txt文件。这是我的密码：

    def main():

        for path in Path(PDFS_FOLDER).glob("*.pdf"):
            with path.open("rb") as file:
                 parser = PDFParser(file)
                 document = PDFDocument(parser, b"")
                 if not document.is_extractable:
                    continue

                 manager = PDFResourceManager()
                 params = LAParams()

                 device = PDFPageAggregator(manager, laparams=params)
                 interpreter = PDFPageInterpreter(manager, device)
        
                 password =b""
                 text = ""

                 for page in PDFPage.create_pages(document):
                       interpreter.process_page(page)
                       for obj in device.get_result():
                           if isinstance(obj, LTTextBox) or isinstance(obj, LTTextLine):
                    text += obj.get_text()
             with open(TEXTS_FOLDER + "{}.txt".format(path.stem), "w") as file:
                 file.write(text)
         return 0


     if __name__ == "__main__":
         import sys
         sys.exit(main())

    def main():

        for path in Path(PDFS_FOLDER).glob("*.pdf"):
            with path.open("rb") as file:
                 parser = PDFParser(file)
                 document = PDFDocument(parser, b"")
                 if not document.is_extractable:
                    continue

                 manager = PDFResourceManager()
                 params = LAParams()

                 device = PDFPageAggregator(manager, laparams=params)
                 interpreter = PDFPageInterpreter(manager, device)
        
                 password =b""
                 text = ""

                 for page in PDFPage.create_pages(document):
                       interpreter.process_page(page)
                       for obj in device.get_result():
                           if isinstance(obj, LTTextBox) or isinstance(obj, LTTextLine):
                    text += obj.get_text()
             with open(TEXTS_FOLDER + "{}.txt".format(path.stem), "w") as file:
                 file.write(text)
         return 0


     if __name__ == "__main__":
         import sys
         sys.exit(main())