Python 如何使用pdfminer.pdffont将字体大小和名称提取为库
我希望提取pdfminer中每个单词的字体及其大小 这是使用pdfminer提取pdf布局的代码 那么要提取Pdfont,我应该怎么做 不要告诉我使用我希望在代码中使用的命令行Python 如何使用pdfminer.pdffont将字体大小和名称提取为库,python,pdfminer,Python,Pdfminer,我希望提取pdfminer中每个单词的字体及其大小 这是使用pdfminer提取pdf布局的代码 那么要提取Pdfont,我应该怎么做 不要告诉我使用我希望在代码中使用的命令行 def read_pdf_miner(fileObj): """ This function takes the file object, read the file content and store it into a dictionary for processing :param fil
def read_pdf_miner(fileObj):
"""
This function takes the file object, read the file content and store it into a dictionary for processing
:param fileObj: File object for reading the file
:return: None
"""
file_pointer = open(fileObj,'rb')
parser = PDFParser(file_pointer)
document = PDFDocument(parser)
if not document.is_extractable:
raise PDFTextExtractionNotAllowed
rsrcmgr = PDFResourceManager()
device = PDFDevice(rsrcmgr)
laparams = LAParams()
device = PDFPageAggregator(rsrcmgr, laparams=laparams)
interpreter = PDFPageInterpreter(rsrcmgr, device)
page_num = 1
id = 0
for page in PDFPage.create_pages(document):
interpreter.process_page(page)
layout = device.get_result()
for lt_obj in layout:
if isinstance(lt_obj, LTTextBoxHorizontal):
text_dict[id] = lt_obj.get_text()
text_prop_dict[id] = lt_obj
id += 1
page_dict[page_num]=text_prop_dict.copy()
text_prop_dict.clear()
page_num += 1
你明白了吗?是的我明白了。谢谢你把答案贴出来!