Python 使用PDFMiner在文本框之间导航
我指的是PDFminer提供的代码:Python 使用PDFMiner在文本框之间导航,python,pdfminer,Python,Pdfminer,我指的是PDFminer提供的代码: from pdfminer.pdfdocument import PDFDocument from pdfminer.pdfpage import PDFPage from pdfminer.pdfparser import PDFParser from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter from pdfminer.converter import PDFPage
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import PDFPageAggregator
from pdfminer.layout import LAParams, LTTextBox, LTTextLine, LTFigure
def parse_layout(layout):
"""Function to recursively parse the layout tree."""
for lt_obj in layout:
print(lt_obj.__class__.__name__)
print(lt_obj.bbox)
if isinstance(lt_obj, LTTextBox):
print(lt_obj.get_text())
elif isinstance(lt_obj, LTFigure):
parse_layout(lt_obj) # Recursive
fp = open(r'C:\Users\lucas\Desktop\XX.pdf', 'rb')
parser = PDFParser(fp)
doc = PDFDocument(parser)
rsrcmgr = PDFResourceManager()
laparams = LAParams()
device = PDFPageAggregator(rsrcmgr, laparams=laparams)
interpreter = PDFPageInterpreter(rsrcmgr, device)
for page in PDFPage.create_pages(doc):
interpreter.process_page(page)
layout = device.get_result()
parse_layout(layout)
此代码能够读取和打印文档的布局。比如说,
LTTextBoxHorizontal
(21.36, 452.28479999999996, 96.2544, 461.74559999999997)
标记和编号
我专门过滤要打印的文本框我想知道是否还有打印特定文本框或浏览文本框的方法。例如,我想从顶部打印第三个框,或从当前文本框打印第五个框。
如果有人能帮忙,我将不胜感激!:) 所以把结果按x或y位置排序,把结果按x或y位置排序。