在iText中操纵路径、颜色等

在iText中操纵路径、颜色等,itext,itext7,Itext,Itext7,我需要分析PDF文件的路径数据,并使用iText 7操作内容。操作包括删除/替换和着色 我可以使用如下代码分析图形: public class ContentParsing { public static void main(String[] args) throws IOException { new ContentParsing().inspectPdf("testdata/test.pdf"); } public void inspectPdf(S

我需要分析PDF文件的路径数据,并使用iText 7操作内容。操作包括删除/替换和着色

我可以使用如下代码分析图形:

public class ContentParsing {
    public static void main(String[] args) throws IOException {
        new ContentParsing().inspectPdf("testdata/test.pdf");
    }

    public void inspectPdf(String path) throws IOException {
        File file = new File(path);
        PdfDocument pdf = new PdfDocument(new PdfReader(file.getAbsolutePath()));
        PdfDocumentContentParser parser = new PdfDocumentContentParser(pdf);
        for (int i=1; i<=pdf.getNumberOfPages(); i++) {
            parser.processContent(i, new PathEventListener());
        }
        pdf.close();
    }
}


public class PathEventListener implements IEventListener {
    public void eventOccurred(IEventData eventData, EventType eventType) {
        PathRenderInfo pathRenderInfo = (PathRenderInfo) eventData;
        for ( Subpath subpath : pathRenderInfo.getPath().getSubpaths() ) {
            for ( IShape segment : subpath.getSegments() ) {
                // Here goes some path analysis code
                System.out.println(segment.getBasePoints());
            }
        }
    }

    public Set<EventType> getSupportedEvents() {
        Set<EventType> supportedEvents = new HashSet<EventType>();
        supportedEvents.add(EventType.RENDER_PATH);
        return supportedEvents;
    }
}
公共类内容解析{
公共静态void main(字符串[]args)引发IOException{
新ContentParsing().inspectPdf(“testdata/test.pdf”);
}
public void inspectPdf(字符串路径)引发IOException{
文件=新文件(路径);
PdfDocument pdf=newpdfdocument(newpdfreader(file.getAbsolutePath());
PdfDocumentContentParser=新的PdfDocumentContentParser(pdf);
对于(int i=1;i
现在,如何处理东西并将它们写回PDF?我必须构建一个全新的PDF文档并复制所有内容(以处理过的形式),还是可以直接处理读取的PDF数据

本质上,您正在寻找一个类,它不仅仅是解析PDF内容流并向其中的指令发送信号,就像
PdfCanvasProcessor
(您使用的
PdfDocumentContentParser
只是
PdfCanvasProcessor
的一个非常薄的包装器)但这也会重新创建内容流,并将指令转发回内容流

通用内容流编辑器类 对于iText 5.5.x,可以在中找到此类内容流编辑器类的概念证明(Java版本在答案文本中稍低一点)

这是iText 7的概念验证端口:

public class PdfCanvasEditor extends PdfCanvasProcessor
{
    /**
     * This method edits the immediate contents of a page, i.e. its content stream.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void editPage(PdfDocument pdfDocument, int pageNumber) throws IOException
    {
        if ((pdfDocument.getReader() == null) || (pdfDocument.getWriter() == null))
        {
            throw new PdfException("PdfDocument must be opened in stamping mode.");
        }

        PdfPage page = pdfDocument.getPage(pageNumber);
        PdfResources pdfResources = page.getResources();
        PdfCanvas pdfCanvas = new PdfCanvas(new PdfStream(), pdfResources, pdfDocument);
        editContent(page.getContentBytes(), pdfResources, pdfCanvas);
        page.put(PdfName.Contents, pdfCanvas.getContentStream());
    }

    /**
     * This method processes the content bytes and outputs to the given canvas.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void editContent(byte[] contentBytes, PdfResources resources, PdfCanvas canvas)
    {
        this.canvas = canvas;
        processContent(contentBytes, resources);
        this.canvas = null;
    }

    /**
     * <p>
     * This method writes content stream operations to the target canvas. The default
     * implementation writes them as they come, so it essentially generates identical
     * copies of the original instructions the {@link ContentOperatorWrapper} instances
     * forward to it.
     * </p>
     * <p>
     * Override this method to achieve some fancy editing effect.
     * </p> 
     */
    protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
    {
        PdfOutputStream pdfOutputStream = canvas.getContentStream().getOutputStream();
        int index = 0;

        for (PdfObject object : operands)
        {
            pdfOutputStream.write(object);
            if (operands.size() > ++index)
                pdfOutputStream.writeSpace();
            else
                pdfOutputStream.writeNewLine();
        }
    }

    //
    // constructor giving the parent a dummy listener to talk to 
    //
    public PdfCanvasEditor()
    {
        super(new DummyEventListener());
    }

    //
    // Overrides of PdfContentStreamProcessor methods
    //
    @Override
    public IContentOperator registerContentOperator(String operatorString, IContentOperator operator)
    {
        ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
        wrapper.setOriginalOperator(operator);
        IContentOperator formerOperator = super.registerContentOperator(operatorString, wrapper);
        return formerOperator instanceof ContentOperatorWrapper ? ((ContentOperatorWrapper)formerOperator).getOriginalOperator() : formerOperator;
    }

    //
    // members holding the output canvas and the resources
    //
    protected PdfCanvas canvas = null;

    //
    // A content operator class to wrap all content operators to forward the invocation to the editor
    //
    class ContentOperatorWrapper implements IContentOperator
    {
        public IContentOperator getOriginalOperator()
        {
            return originalOperator;
        }

        public void setOriginalOperator(IContentOperator originalOperator)
        {
            this.originalOperator = originalOperator;
        }

        @Override
        public void invoke(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            if (originalOperator != null && !"Do".equals(operator.toString()))
            {
                originalOperator.invoke(processor, operator, operands);
            }
            write(processor, operator, operands);
        }

        private IContentOperator originalOperator = null;
    }

    //
    // A dummy event listener to give to the underlying canvas processor to feed events to
    //
    static class DummyEventListener implements IEventListener
    {
        @Override
        public void eventOccurred(IEventData data, EventType type)
        { }

        @Override
        public Set<EventType> getSupportedEvents()
        {
            return null;
        }
    }
}
(测试方法
testRemoveBoldMTTextDocument

testRemoveBigTextDocument
此示例删除所有使用大字体书写的文本:

try (   InputStream resource = getClass().getResourceAsStream("document.pdf");
        PdfReader pdfReader = new PdfReader(resource);
        OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "document-noBigText.pdf"));
        PdfWriter pdfWriter = new PdfWriter(result);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfCanvasEditor()
    {

        @Override
        protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            String operatorString = operator.toString();

            if (TEXT_SHOWING_OPERATORS.contains(operatorString))
            {
                if (getGraphicsState().getFontSize() > 100)
                    return;
            }
            
            super.write(processor, operator, operands);
        }

        final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
    };
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}
(测试方法
testChangeBlackTextToGreenDocument

现在,如何处理东西并将它们写回PDF?我必须构建一个全新的PDF文档并复制所有内容(以处理过的形式),还是可以直接处理读取的PDF数据

本质上,您正在寻找一个类,它不仅仅是解析PDF内容流并向其中的指令发送信号,就像
PdfCanvasProcessor
(您使用的
PdfDocumentContentParser
只是
PdfCanvasProcessor
的一个非常薄的包装器)但这也会重新创建内容流,并将指令转发回内容流

通用内容流编辑器类 对于iText 5.5.x,可以在中找到此类内容流编辑器类的概念证明(Java版本在答案文本中稍低一点)

这是iText 7的概念验证端口:

public class PdfCanvasEditor extends PdfCanvasProcessor
{
    /**
     * This method edits the immediate contents of a page, i.e. its content stream.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void editPage(PdfDocument pdfDocument, int pageNumber) throws IOException
    {
        if ((pdfDocument.getReader() == null) || (pdfDocument.getWriter() == null))
        {
            throw new PdfException("PdfDocument must be opened in stamping mode.");
        }

        PdfPage page = pdfDocument.getPage(pageNumber);
        PdfResources pdfResources = page.getResources();
        PdfCanvas pdfCanvas = new PdfCanvas(new PdfStream(), pdfResources, pdfDocument);
        editContent(page.getContentBytes(), pdfResources, pdfCanvas);
        page.put(PdfName.Contents, pdfCanvas.getContentStream());
    }

    /**
     * This method processes the content bytes and outputs to the given canvas.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void editContent(byte[] contentBytes, PdfResources resources, PdfCanvas canvas)
    {
        this.canvas = canvas;
        processContent(contentBytes, resources);
        this.canvas = null;
    }

    /**
     * <p>
     * This method writes content stream operations to the target canvas. The default
     * implementation writes them as they come, so it essentially generates identical
     * copies of the original instructions the {@link ContentOperatorWrapper} instances
     * forward to it.
     * </p>
     * <p>
     * Override this method to achieve some fancy editing effect.
     * </p> 
     */
    protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
    {
        PdfOutputStream pdfOutputStream = canvas.getContentStream().getOutputStream();
        int index = 0;

        for (PdfObject object : operands)
        {
            pdfOutputStream.write(object);
            if (operands.size() > ++index)
                pdfOutputStream.writeSpace();
            else
                pdfOutputStream.writeNewLine();
        }
    }

    //
    // constructor giving the parent a dummy listener to talk to 
    //
    public PdfCanvasEditor()
    {
        super(new DummyEventListener());
    }

    //
    // Overrides of PdfContentStreamProcessor methods
    //
    @Override
    public IContentOperator registerContentOperator(String operatorString, IContentOperator operator)
    {
        ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
        wrapper.setOriginalOperator(operator);
        IContentOperator formerOperator = super.registerContentOperator(operatorString, wrapper);
        return formerOperator instanceof ContentOperatorWrapper ? ((ContentOperatorWrapper)formerOperator).getOriginalOperator() : formerOperator;
    }

    //
    // members holding the output canvas and the resources
    //
    protected PdfCanvas canvas = null;

    //
    // A content operator class to wrap all content operators to forward the invocation to the editor
    //
    class ContentOperatorWrapper implements IContentOperator
    {
        public IContentOperator getOriginalOperator()
        {
            return originalOperator;
        }

        public void setOriginalOperator(IContentOperator originalOperator)
        {
            this.originalOperator = originalOperator;
        }

        @Override
        public void invoke(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            if (originalOperator != null && !"Do".equals(operator.toString()))
            {
                originalOperator.invoke(processor, operator, operands);
            }
            write(processor, operator, operands);
        }

        private IContentOperator originalOperator = null;
    }

    //
    // A dummy event listener to give to the underlying canvas processor to feed events to
    //
    static class DummyEventListener implements IEventListener
    {
        @Override
        public void eventOccurred(IEventData data, EventType type)
        { }

        @Override
        public Set<EventType> getSupportedEvents()
        {
            return null;
        }
    }
}
(测试方法
testRemoveBoldMTTextDocument

testRemoveBigTextDocument
此示例删除所有使用大字体书写的文本:

try (   InputStream resource = getClass().getResourceAsStream("document.pdf");
        PdfReader pdfReader = new PdfReader(resource);
        OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "document-noBigText.pdf"));
        PdfWriter pdfWriter = new PdfWriter(result);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfCanvasEditor()
    {

        @Override
        protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            String operatorString = operator.toString();

            if (TEXT_SHOWING_OPERATORS.contains(operatorString))
            {
                if (getGraphicsState().getFontSize() > 100)
                    return;
            }
            
            super.write(processor, operator, operands);
        }

        final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
    };
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(测试方法
testChangeBlackTextToGreenDocument

创建一个新的pdf并添加修改后的内容可能是让您完全控制的最佳方式。修改现有pdf在技术上是可行的,使用iText可以很容易地完成一些任务,如在现有内容上/下添加内容或使用不同颜色突出显示。其他任务,尤其是文本替换t或search包含很多陷阱,技术上很难。我建议大家看看并浏览一些示例和教程,看看有什么可能。创建一个新的pdf并添加修改后的内容可能是最好的方法,可以让你完全控制。修改现有的pdf在技术上是可行的,等等使用iText,在现有内容上方/下方添加内容或使用不同颜色突出显示等任务非常容易。其他任务,尤其是文本替换或搜索等任务,包含大量陷阱,技术难度较大。我建议查看并浏览一些示例和教程,看看可能的情况。W哦,一个非常彻底的答案!我将花一些时间来研究它。非常感谢!只是为了让使用示例完整-似乎应该有
pdfDocument.close()
在三个使用示例中的
for
循环之后,对吗?至少如果我不添加它,我只会得到一个空文件。或者这只是Java 1.8的问题吗?@Thomas示例中的
PdfDocument PdfDocument
实例会自动关闭,因为它们被相应地定义
try(这里){…}
。啊,我明白了。Eclipse抱怨尝试使用资源,所以我删除了它以获得一个基本版本来修补。以前没有做过很多Java工作-我认识到在Eclipse中,我必须右键单击项目,选择Properties/Java Compiler并将“Compiler compliance settings”设置为“1.7”。你的PdfCanvasProcessor类是如何获得许可的?我在你发布它的网站上没有看到许可证。哇,一个非常彻底的答案!我需要花一些时间来研究它。非常感谢!只是为了让使用示例完整-似乎应该有
pdfDocument.close()
在三个使用示例中的
for
循环之后,对吗?至少如果我不添加它,我只会得到一个空文件。或者这只是Java 1.8的问题吗?@Thomas示例中的
PdfDocument PdfDocument
实例会自动关闭,因为它们被相应地定义
try(这里){…}
。啊,我明白了。Eclipse抱怨尝试使用资源,所以我删除了它以获得一个基本版本来修补。以前没有做过很多Java工作-我认识到在Eclipse中,我必须右键单击项目,选择Properties/Java Compiler并将“Compiler compliance settings”设置为“1.7”.您的PdfCanvasProcessor类是如何获得许可的?我在您发布它的位置上没有看到许可证。