Itext:使用条形码分隔符拆分pdf文档

Itext:使用条形码分隔符拆分pdf文档,pdf,split,itext,barcode,Pdf,Split,Itext,Barcode,我面临以下用例: 我收到一份包含许多文档的pdf。每个文档都有不同的页数。它们由条形码页分隔 是否可以拆分包含多个文档的多页PDF,这些文档由带有条形码的页面分隔,并为每个文档创建一个新的PDF 我读到我们可以用Itext拆分pdf: 但我在网上找不到当我检测到条形码页面时分割它的方法 更新: @mkl 我发现如何使用zxing读取二维码中的文字: 它适用于简单的png文件 File QRfile = new File("test.png"); BufferedImage bufferedI

我面临以下用例:

我收到一份包含许多文档的pdf。每个文档都有不同的页数。它们由条形码页分隔

是否可以拆分包含多个文档的多页PDF,这些文档由带有条形码的页面分隔,并为每个文档创建一个新的PDF

我读到我们可以用Itext拆分pdf:

但我在网上找不到当我检测到条形码页面时分割它的方法

更新: @mkl 我发现如何使用zxing读取二维码中的文字: 它适用于简单的png文件

File QRfile = new File("test.png");

BufferedImage bufferedImg = ImageIO.read(QRfile);
LuminanceSource source = new BufferedImageLuminanceSource(bufferedImg);
BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source));

Result result = new MultiFormatReader().decode(bitmap);

System.out.println("Barcode Format: " + result.getBarcodeFormat());
                        System.out.println("Content: " + result.getText());
但它不在循环中工作。我用pdf文档测试(7页)

这里是JAVA代码:

PdfDocument pdfDoc;
pdfDoc = new PdfDocument(new PdfReader(pathName));
logger.debug("pdfDoc OK"); 
PdfDocumentContentParser contentParser = new PdfDocumentContentParser(pdfDoc);
for (int page = 1; page <= pdfDoc.getNumberOfPages(); page++)
{
    logger.debug("page: " + page); 
    contentParser.processContent(page, new IEventListener()
    {
        @Override
        public Set<EventType> getSupportedEvents()
        {
            logger.debug("inside getSupportedEvents"); 
            return Collections.singleton(RENDER_IMAGE);
        }

        @Override
        public void eventOccurred(IEventData data, EventType type)
        {
            index = index + 1;
            logger.debug("inside eventOccurred - data: " + data);
            logger.debug("inside eventOccurred - type: " + type);
            logger.debug("inside eventOccurred - index: " + index);
            if (data instanceof ImageRenderInfo)
            {
                logger.debug("data instanceof ImageRenderInfo"); 
                ImageRenderInfo imageRenderInfo = (ImageRenderInfo) data;
                byte[] bytes = imageRenderInfo.getImage().getImageBytes();
                try
                {
                    logger.debug("avant Files writer");
                    String pngName = "C:/alfresco/klinck/splitImage-" + index + ".png";
                    logger.debug("pngName: " + pngName);
                    Files.write(new File(pngName).toPath(), bytes);
                    logger.debug("Files written");
                    File QRfile = new File(pngName);
                    logger.debug("QR File trouvé ! ");
                    BufferedImage bufferedImg = ImageIO.read(QRfile);
                    logger.debug("bufferedImg OK ");
                    LuminanceSource source = new BufferedImageLuminanceSource(bufferedImg);
                    logger.debug("source OK ");
                    BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source));
                    logger.debug("bitmap OK");
                    Result result = new MultiFormatReader().decode(bitmap);
                    logger.debug("SplitFluxJobExcecuter - resultBarcodeFormat: " + result.getBarcodeFormat());
                    logger.debug("SplitFluxJobExcecuter - result.getText(): " + result.getText());
                }catch (Exception e)
                {
                   logger.error("SplitJobExecuter Exception : " + ExceptionUtils.getStackTrace(e));
                }
            }
        }
        int index = 0;

        });
    }
从第2页到第7页,错误消息不同:

2018-07-25 16:27:00,487 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] page: 2
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside getSupportedEvents
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@6d41ffa2
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 1
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-1.png
2018-07-25 16:27:00,492 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,493 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé ! 
2018-07-25 16:27:00,493 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK 
2018-07-25 16:27:00,493 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : java.lang.NullPointerException
    at com.google.zxing.client.j2se.BufferedImageLuminanceSource.<init>(BufferedImageLuminanceSource.java:42)
    at com.klinck.mc.jobs.SplitFluxJobExecuter$1.eventOccurred(SplitFluxJobExecuter.java:150)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.eventOccurred(PdfCanvasProcessor.java:534)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.displayImage(PdfCanvasProcessor.java:573)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.access$5800(PdfCanvasProcessor.java:108)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$ImageXObjectDoHandler.handleXObject(PdfCanvasProcessor.java:1420)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.displayXObject(PdfCanvasProcessor.java:566)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.access$5600(PdfCanvasProcessor.java:108)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$DoOperator.invoke(PdfCanvasProcessor.java:1285)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.invokeOperator(PdfCanvasProcessor.java:452)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processContent(PdfCanvasProcessor.java:281)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processPageContent(PdfCanvasProcessor.java:302)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:77)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:90)
    at com.klinck.mc.jobs.SplitFluxJobExecuter.execute(SplitFluxJobExecuter.java:118)
    at com.klinck.mc.jobs.SplitFluxJob$1.doWork(SplitFluxJob.java:27)
    at org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:555)
    at com.klinck.mc.jobs.SplitFluxJob.executeJob(SplitFluxJob.java:24)
    at org.alfresco.schedule.ScheduledJobLockExecuter.execute(ScheduledJobLockExecuter.java:94)
    at org.alfresco.schedule.AbstractScheduledLockedJob.executeInternal(AbstractScheduledLockedJob.java:72)
    at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:114)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:563)
2018-07-25 16:27:00487调试[com.klink.mc.jobs.SplitFluxJobExecuter][schedulerSplit\u-Worker-1]第2页
2018-07-25 16:27:00488在getSupportedEvents内部调试[com.klink.mc.jobs.SplitFluxJobExecuter][schedulerSplit\u Worker-1]
2018-07-25 16:27:00488调试[com.klinck.mc.jobs.SplitFluxJobExecuter][schedulerSplit_Worker-1]内部事件发生-数据:com.itextpdf.kernel.pdf.canvas.parser.data。ImageRenderInfo@6d41ffa2
2018-07-25 16:27:00488调试[com.klinck.mc.jobs.SplitFluxJobExecuter][schedulerSplit\u Worker-1]发生内部事件-类型:RENDER\u IMAGE
2018-07-25 16:27:00488调试[com.klink.mc.jobs.SplitFluxJobExecuter][schedulerSplit\u Worker-1]发生内部事件-索引:1
2018-07-25 16:27:00489调试ImageRenderInfo的数据实例
2018-07-25 16:27:00489调试[com.klink.mc.jobs.SplitFluxJobExecuter][schedulerSplit_Worker-1]先锋文件编写器
2018-07-25 16:27:00489调试[com.klinck.mc.jobs.SplitFluxJobExecuter][schedulerSplit_Worker-1]pngName:C:/alfresco/klinck/splitImage-1.png
2018-07-25 16:27:00492调试[com.klink.mc.jobs.SplitFluxJobExecuter][schedulerSplit_-Worker-1]写入的文件
2018-07-25 16:27:00493调试[com.klink.mc.jobs.SplitFluxJobExecuter][schedulerSplit_Worker-1]QR文件trouvé!
2018-07-25 16:27:00493调试[com.klink.mc.jobs.SplitFluxJobExecuter][schedulerSplit_Worker-1]缓冲区正常
2018-07-25 16:27:00493错误[com.klink.mc.jobs.SplitFluxJobExecuter][schedulerSplit\u Worker-1]SplitJobExecuter异常:java.lang.NullPointerException
位于com.google.zxing.client.j2se.BufferedImageLuminanceSource。(BufferedImageLuminanceSource.java:42)
在com.klink.mc.jobs.SplitFluxJobExecuter$1.EventOccursed(SplitFluxJobExecuter.java:150)
位于com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.eventoccurrent(PdfCanvasProcessor.java:534)
位于com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.displayImage(PdfCanvasProcessor.java:573)
com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.access$5800(PdfCanvasProcessor.java:108)
在com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$ImageXObjectDoHandler.handleXObject(PdfCanvasProcessor.java:1420)上
位于com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.displayXObject(PdfCanvasProcessor.java:566)
com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.access$5600(PdfCanvasProcessor.java:108)
位于com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$DoOperator.invoke(PdfCanvasProcessor.java:1285)
位于com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.invokeOperator(PdfCanvasProcessor.java:452)
位于com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processContent(PdfCanvasProcessor.java:281)
位于com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processPageContent(PdfCanvasProcessor.java:302)
位于com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:77)
位于com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:90)
在com.klink.mc.jobs.SplitFluxJobExecuter.execute(SplitFluxJobExecuter.java:118)上
在com.klink.mc.jobs.SplitFluxJob$1.doWork(SplitFluxJob.java:27)
位于org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:555)
在com.klink.mc.jobs.SplitFluxJob.executeJob(SplitFluxJob.java:24)上
位于org.alfresco.schedule.ScheduledJobLockExecuter.execute(ScheduledJobLockExecuter.java:94)
位于org.alfresco.schedule.AbstractScheduledLockedJob.executeInternal(AbstractScheduledLockedJob.java:72)
位于org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:114)
位于org.quartz.core.JobRunShell.run(JobRunShell.java:216)
位于org.quartz.siml.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:563)
更新2

我认为出现错误消息“com.google.zxing.NotFoundException”是因为图像不包含文本消息或太大:

它通过以下方法对我有效:

步骤1:

检测特定二维码并将页码存储在列表中:

pdfdocumentpdfdoc;
pdfDoc=新PdfDocument(新PdfReader(路径名));
logger.debug(“pdfDoc OK”);
PdfDocumentContentParser contentParser=新的PdfDocumentContentParser(pdfDoc);
List pageList=新建ArrayList();
int[]当前页面=新int[1];
对于(int page=1;页面从二维码读取数据
Files.write(新文件(pngName).toPath(),字节);
BuffereImage bufferedImg=ImageIO.read(图像);
亮度源=新的BufferedImageLuminanceSource(bufferedImg);
BinaryBitmap位图=新的BinaryBitmap(新的混合二进制程序(源));
结果=新的MultiFormatReader()。解码(位图);
if(result.getBarcodeFormat().toString().equals(“QR_代码”)和&result.getText().toString().equals(“分隔符”)){
//关于QR码Klingk页面的stocke les numéros de pages
pageList.add(当前页面[0]);
logger.debug(
2018-07-25 16:27:00,487 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] page: 2
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside getSupportedEvents
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - data: com.itextpdf.kernel.pdf.canvas.parser.data.ImageRenderInfo@6d41ffa2
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - type: RENDER_IMAGE
2018-07-25 16:27:00,488 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] inside eventOccurred - index: 1
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] data instanceof ImageRenderInfo
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] avant Files writer
2018-07-25 16:27:00,489 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] pngName: C:/alfresco/klinck/splitImage-1.png
2018-07-25 16:27:00,492 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] Files written
2018-07-25 16:27:00,493 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] QR File trouvé ! 
2018-07-25 16:27:00,493 DEBUG [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] bufferedImg OK 
2018-07-25 16:27:00,493 ERROR [com.klinck.mc.jobs.SplitFluxJobExecuter] [schedulerSplit_Worker-1] SplitJobExecuter Exception : java.lang.NullPointerException
    at com.google.zxing.client.j2se.BufferedImageLuminanceSource.<init>(BufferedImageLuminanceSource.java:42)
    at com.klinck.mc.jobs.SplitFluxJobExecuter$1.eventOccurred(SplitFluxJobExecuter.java:150)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.eventOccurred(PdfCanvasProcessor.java:534)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.displayImage(PdfCanvasProcessor.java:573)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.access$5800(PdfCanvasProcessor.java:108)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$ImageXObjectDoHandler.handleXObject(PdfCanvasProcessor.java:1420)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.displayXObject(PdfCanvasProcessor.java:566)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.access$5600(PdfCanvasProcessor.java:108)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor$DoOperator.invoke(PdfCanvasProcessor.java:1285)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.invokeOperator(PdfCanvasProcessor.java:452)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processContent(PdfCanvasProcessor.java:281)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfCanvasProcessor.processPageContent(PdfCanvasProcessor.java:302)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:77)
    at com.itextpdf.kernel.pdf.canvas.parser.PdfDocumentContentParser.processContent(PdfDocumentContentParser.java:90)
    at com.klinck.mc.jobs.SplitFluxJobExecuter.execute(SplitFluxJobExecuter.java:118)
    at com.klinck.mc.jobs.SplitFluxJob$1.doWork(SplitFluxJob.java:27)
    at org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:555)
    at com.klinck.mc.jobs.SplitFluxJob.executeJob(SplitFluxJob.java:24)
    at org.alfresco.schedule.ScheduledJobLockExecuter.execute(ScheduledJobLockExecuter.java:94)
    at org.alfresco.schedule.AbstractScheduledLockedJob.executeInternal(AbstractScheduledLockedJob.java:72)
    at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:114)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:563)