Java PDF已使用PDF Box库拆分，生成的PDF与源PDF文件的大小几乎相同_Java_Pdf_Pdfbox

Java PDF已使用PDF Box库拆分，生成的PDF与源PDF文件的大小几乎相同

java pdf

Java PDF已使用PDF Box库拆分，生成的PDF与源PDF文件的大小几乎相同,java,pdf,pdfbox,Java,Pdf,Pdfbox,我使用下面的代码将巨大的PDF拆分为两个不同的PDF。PDF正在正确拆分。第一个PDF将与PDF的前两页一起生成，第二个PDF将与PDF的其余页一起生成问题是大小，源PDF是17MB。生成的2个PDF也各有15MB。从逻辑上讲，它应该是较小的大小，我搜索了论坛，他们说PDFont必须正确使用。我没有在这里使用PDFont，不确定我是否做得不对 public static void main(String[] args) throws IOException, COSVisitorExc

我使用下面的代码将巨大的PDF拆分为两个不同的PDF。PDF正在正确拆分。第一个PDF将与PDF的前两页一起生成，第二个PDF将与PDF的其余页一起生成

问题是大小，源PDF是17MB。生成的2个PDF也各有15MB。从逻辑上讲，它应该是较小的大小，我搜索了论坛，他们说PDFont必须正确使用。我没有在这里使用PDFont，不确定我是否做得不对

    public static void main(String[] args) throws IOException, COSVisitorException {

            File input = new File("sourceFile.pdf");

           // pdPage and pdPage1 will be used to get first and second page of entire PDF
//pdPageMedRec will get the rest of the pages
            PDPage pdPage = null;
            PDPage pdPage1 = null;
            PDPage pdPageMedRec = null;

            PDDocument firstOutputDocument = null;
            PDDocument secondOutputDocument = null;
            PDDocument inputDocument = PDDocument.loadNonSeq(input, null);
            List<PDPage> list = inputDocument.getDocumentCatalog().getAllPages();


       // I wanted two documents to be generated from the big PDF
//firstOutputDocument  is document 1 and it will be having first 2 pages of the big pdf
//secondOutputDocument is document 2 and it will be having the rest of the pages of the PDF
            firstOutputDocument = new PDDocument();
            secondOutputDocument = new PDDocument();

// Taking first page and second page
            pdPage = list.get(0);
            pdPage1 = list.get(1);

// Appending them as one document
            firstOutputDocument.importPage(pdPage);
            firstOutputDocument.importPage(pdPage1);

// Looping the rest of the pages
            for (int page = 3; page <= inputDocument.getNumberOfPages(); ++page) {
                pdPageMedRec = (PDPage) inputDocument.getDocumentCatalog().getAllPages().get(page - 1);
                // append page to current document
                secondOutputDocument.importPage(pdPageMedRec);
            }

// Saving first document
            File f = new File("document1.pdf");
            firstOutputDocument.save(f);
            firstOutputDocument.close();

// Saving second document
            File g = new File("document2.pdf");
            secondOutputDocument.save(g);
            secondOutputDocument.close();

            inputDocument.close();
        }

publicstaticvoidmain（字符串[]args）抛出IOException、COSVisitorException{
文件输入=新文件（“sourceFile.pdf”）；
//pdPage和pdPage1将用于获取整个PDF的第一页和第二页
//pdPageMedRec将获取其余页面
PDPage PDPage=null；
PDPage pdPage1=null；
PDPage pdPageMedRec=null；
PDDocument firstOutputDocument=null；
PDDocument secondOutputDocument=null；
PDDocument inputDocument=PDDocument.loadNonSeq（输入，空）；
List List=inputDocument.getDocumentCatalog（）.getAllPages（）；
//我想从大PDF生成两个文档
//firstOutputDocument是文档1，它将具有大pdf的前2页
//secondOutputDocument是文档2，它将包含PDF的其余页面
firstOutputDocument=新的PDDocument（）；
secondOutputDocument=新的PDDocument（）；
//取第一页和第二页
pdPage=list.get（0）；
pdPage1=list.get（1）；
//将它们作为一个文档追加
firstOutputDocument.importPage（pdPage）；
firstOutputDocument.importPage（pdPage1）；
//循环其他页面
对于（int page=3；第页“从逻辑上讲，其大小应较小”-不一定。如果页面共享大量资源，则结果可能是适当的。如果没有PDF来复制问题，则很难对其进行分析。@mkl我同意。我拥有的PDF包含大量图像和文本内容以及其中的表格。我无权共享。但我想了解我的代码是否为p性能是否正确？或者我使用OutputDocument.importPage（）附加页面时出错。您使用的是什么PDFBox版本？希望是2.0.13。请参阅importPage的javadoc中的警告，“…那么目标文档可能会变大”。@Vibin“但我想了解我的代码是否正确？”-您的代码看起来并不不合理。但是，根据源文档的内部性质，可能需要进行一些特定的添加。@Tilmahauser我使用的是1.8.4版的PDF BOX jar。此版本中没有PDF Font BOX jar。因此，我添加了1.8.4版的PDF Font BOX作为外部jar并使用它。