Python:使用PyPDF2读取PDF会导致多余的空白错误
我一直在用Python阅读PDF中的文本 我需要的是PyPDF2来查找给定字符串并返回放置在该字符串旁边的引用号 这就是我正在尝试的代码:Python:使用PyPDF2读取PDF会导致多余的空白错误,python,whitespace,pypdf2,Python,Whitespace,Pypdf2,我一直在用Python阅读PDF中的文本 我需要的是PyPDF2来查找给定字符串并返回放置在该字符串旁边的引用号 这就是我正在尝试的代码: import os import shutil import PyPDF2 from PyPDF2 import PdfFileWriter, PdfFileReader jobpath = r"C:\Scrpts\scr\testPDF" for files in os.listdir(jobpath): if fi
import os
import shutil
import PyPDF2
from PyPDF2 import PdfFileWriter, PdfFileReader
jobpath = r"C:\Scrpts\scr\testPDF"
for files in os.listdir(jobpath):
if files.endswith('.pdf'):
filename = os.path.join(jobpath, files)
with open(filename, 'rb') as pageObj1:
pdfReader1 = PyPDF2.PdfFileReader(pageObj1)
pdfReader1._override_encryption = True
pageObj1 = pdfReader1.getPage(0)
text1 = pageObj1.extractText()
refNum = text1.partition("Reference")
text1 = refNum[2]
text1 = text1[0:30]
a = 'Reference'
b = '\n'
text1 = text1.split(a)[-1].split(b)[0]
refNum = text1
print(filename + ' ' + refNum)
但这会产生一个多余的空白错误:
PdfReadWarning: Superfluous whitespace found in object header b'1' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'2' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'3' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'48' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'95' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'113' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'126' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'129' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'140' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'143' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'146' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'149' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'152' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'155' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'158' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'161' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'164' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'167' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'170' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'173' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'184' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'187' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'190' b'0' [pdf.py:1668]
PdfReadWarning: Superfluous whitespace found in object header b'46' b'0' [pdf.py:1668]
C:\Scrpts\scr\testPDF\testPDF.pdf
我过去使用过类似的脚本,没有任何问题
我试图寻找类似的问题,但是,我找不到任何解决办法