将PDF文件转换为.txt python 3
我尝试将多个名为FK_EPPS的pdf文件转换为txt文件,并将其写入名为FK_txt的不同文件夹中。但它说没有这样的文件或目录。我将文件夹完全放在这些路径中。我试图找到解决方案,但仍然有一个错误。你能告诉我为什么会这样吗将PDF文件转换为.txt python 3,python,pdfminer,pdftotext,Python,Pdfminer,Pdftotext,我尝试将多个名为FK_EPPS的pdf文件转换为txt文件,并将其写入名为FK_txt的不同文件夹中。但它说没有这样的文件或目录。我将文件夹完全放在这些路径中。我试图找到解决方案,但仍然有一个错误。你能告诉我为什么会这样吗 from io import StringIO from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter from pdfminer.converter import TextConverte
from io import StringIO
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
import os
import sys, getopt
#converts pdf, returns its text content as a string
def convert(fname, pages=None):
if not pages:
pagenums = set()
else:
pagenums = set(pages)
output = StringIO
manager = PDFResourceManager()
converter = TextConverter(manager, output, laparams=LAParams())
interpreter = PDFPageInterpreter(manager, converter)
filepath = open(fname, 'rb')
for page in PDFPage.get_pages(filepath, pagenums):
interpreter.process_page(page)
filepath.close()
converter.close()
text = output.getvalue()
output.close
return text
def convertMultiple(pdfDir, txtDir):
if pdfDir == "": pdfDir = os.getcwd() + "\\" #if no pdfDir passed in
for pdf in os.listdir(pdfDir): #iterate through pdfs in pdf directory
fileExtension = pdf.split(".")[-1]
if fileExtension == "pdf":
pdfFilename = pdfDir + pdf
text = convert(pdfFilename) #get string of text content of pdf
textFilename = txtDir + pdf + ".txt"
textFile = open(textFilename, "w") #make text file
textFile.write(text) #write text to text file
#textFile.close
pdfDir = (r"FK_EPPS")
txtDir = (r"FK_txt")
convertMultiple(pdfDir, txtDir)
/usr/local/lib/python2.7/dist-packages/pdfminer/_-init__u;.py:20:UserWarning:2020年1月1日,pdfminer.six将停止支持Python 2。请升级到Python 3。有关更多信息,请参阅https://github.com/pdfminer/pdfminer.six/issues/194
warnings.warn('2020年1月1日,pdfminer.six将停止支持Python 2。请升级到Python 3.For'
回溯(最近一次呼叫最后一次):
文件“/home/a1 re/Documents/pdftotext/1.py”,第44行,在
convertMultiple(pdfDir、txtDir)
文件“/home/a1 re/Documents/pdftotext/1.py”,第36行,格式为
text=convert(pdfFilename)#获取pdf的文本内容字符串
文件“/home/a1 re/Documents/pdftotext/1.py”,第21行,转换格式
filepath=file(fname,'rb')
IOError:[Errno 2]没有这样的文件或目录:“pdf1831150030.pdf”
(您显示的回溯不可能是正确的。对于您的示例输入,错误应该在开始时包含FK\u EPPS
)
您忘记了路径和文件名必须使用适用于您的操作系统的适当名称彼此分开
如果在convert
函数的开头打印出fname
的值,您可能会立即看到这一点。对于文本输出文件名,您也会犯同样的错误,但这将更难注意,因为这不会产生错误,而只会创建错误的文件名
/usr/local/lib/python2.7/dist-packages/pdfminer/__init__.py:20: UserWarning: On January 1st, 2020, pdfminer.six will stop supporting Python 2. Please upgrade to Python 3. For more information see https://github.com/pdfminer/pdfminer.six/issues/194
warnings.warn('On January 1st, 2020, pdfminer.six will stop supporting Python 2. Please upgrade to Python 3. For '
Traceback (most recent call last):
File "/home/a1-re/Documents/pdftotext/1.py", line 44, in <module>
convertMultiple(pdfDir, txtDir)
File "/home/a1-re/Documents/pdftotext/1.py", line 36, in convertMultiple
text = convert(pdfFilename) #get string of text content of pdf
File "/home/a1-re/Documents/pdftotext/1.py", line 21, in convert
filepath = file(fname, 'rb')
IOError: [Errno 2] No such file or directory: 'pdf1831150030.pdf'