Python 符合UTF-8的PDF衬里

Python 符合UTF-8的PDF衬里,python,python-2.7,pdfminer,Python,Python 2.7,Pdfminer,我正在使用PDFMiner将PDF转换为文本。然后我想把它编码为,因为文本在 这是PDFMiner的代码: from pdfminer.pdfdocument import PDFDocument from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter from pdfminer.converter import TextConverter from pdfminer.layout import LAParams

我正在使用PDFMiner将PDF转换为文本。然后我想把它编码为,因为文本在

这是PDFMiner的代码:

from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from cStringIO import StringIO
from pdfminer.pdfparser import PDFParser


def convert_pdf_to_txt(path):
    rsrcmgr = PDFResourceManager()
    retstr = StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)

    fp = file(path, 'rb')

    parser = PDFParser(fp)
    doc = PDFDocument(parser)
    parser.set_document(doc)

    interpreter = PDFPageInterpreter(rsrcmgr, device)
    password = ""
    maxpages = 0
    caching = True
    pagenos=set()

    for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages,  password=password,caching=caching, check_extractable=True):
    interpreter.process_page(page)

    text = retstr.getvalue()

    fp.close()
    device.close()
    retstr.close()
    return text
现在,当我尝试打印它时:

    elif file[-4:] == ".pdf":
    text = convert_pdf_to_txt("C:\Users\Vadim\Desktop\Python\New_cvs\\" + file)
    print text
它把文本倒过来,像“rac”而不是“car”,但是希伯来语

我如何纠正它

我尝试过元组切片,但是它会反转邮件文本和电话号码文本,所以这不是一个选项

''.join(reversed(myString))
也不是一个选项:(