Java PDFbox,IOException:“异常”;Costream已关闭……”;

Java PDFbox,IOException:“异常”;Costream已关闭……”;,java,pdfbox,Java,Pdfbox,将PDF转换为图像、将此图像添加到新的PDF文档并在其上打印文本时,我遇到以下错误: java.io.IOException: COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed? at org.apache.pdfbox.cos.COSStream.getFilteredStream(COSStream.java:179) at org.ap

将PDF转换为图像、将此图像添加到新的PDF文档并在其上打印文本时,我遇到以下错误:

java.io.IOException: COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
    at org.apache.pdfbox.cos.COSStream.getFilteredStream(COSStream.java:179)
    at org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1147)
    at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:298)
    at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158)
    at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:538)
    at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:450)
    at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1031)
    at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:401)
    at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1314)
    at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1215)
    at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:991)
    at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:963)
    at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:951)
    at org.foo.bar.experimental.printOCROverlay(PDFDocument.java:343)
我有经过扫描和OCR处理的文本文档。由于OCR质量在不同的文档之间不一致,我想通过在原始PDF上打印真实的/OCR文本来检查它的质量

我写了一个非常线性的方法,首先解析整个文本,然后从原始PDF中提取所有页面,最后创建一个新的PDF文档,绘制图像并打印文本。 我在网上发现了几篇帖子,其中的错误可以追溯到一个PDF文档过早关闭。但是,当将所有
close()
-调用移到末尾时,我也会遇到这个错误

下面是打印文档的方法。我已经查明了错误源,一直到我打印文本的部分(内部
for
-loop,从注释行
//Draw extracted text
开始),原因是当我从
contentStream.beginext()
contentStream.endText()
中删除该部分时,一切正常。但我就是不知道我要改变什么才能让它发挥作用

顺便说一句,我用的是PDFBOX2

protected void printOCROverlay() throws IOException {

    if (_listOfWords == null || _listOfWords.isEmpty()) {
        __log.error("No words in PDF.");
        return;
    }

    // Get all pages as image
    PDDocument pdfDocIn = PDDocument.load(new File(_inputFileName_PDF));
    PDFRenderer pdfRenderer = new PDFRenderer(pdfDocIn);
    ArrayList<BufferedImage> pageImages_List = new ArrayList<BufferedImage>();
    for (int n = 0; n < pdfDocIn.getNumberOfPages(); n++) {
        BufferedImage pageImage = null;
        try {
            pageImage = pdfRenderer.renderImageWithDPI(n, RENDER_DPI, ImageType.RGB);
            pageImages_List.add(pageImage);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    pdfDocIn.close();

    // Init new pdf-doc
    PDDocument pdDocument = new PDDocument();

    // Init first page
    PDPage page = new PDPage(PDRectangle.A4);
    pdDocument.addPage(page);

    // Init content stream
    PDPageContentStream contentStream = new PDPageContentStream(pdDocument, page, true, true);
    contentStream.setStrokingColor(_lightGrayColor);    // Bounding box color

    // Draw first image
    PDImageXObject pdImage = LosslessFactory.createFromImage(pdfDocIn, pageImages_List.get(_listOfWords.get(0).page-1));
    contentStream.drawImage(pdImage, 0, 0, PDRectangle.A4.getWidth(), PDRectangle.A4.getHeight());


    for (int i = 0; i < _listOfWords.size(); i++) {
        DocElements.Word currentWord = _listOfWords.get(i);

        // Create new page
        if (i > 0 && currentWord.page > _listOfWords.get(i-1).page) {
            contentStream.close();

            // Create new page, init new content stream
            page = new PDPage(PDRectangle.A4);
            pdDocument.addPage(page);
            contentStream = new PDPageContentStream(pdDocument, page, true, true);
            contentStream.setStrokingColor(_lightGrayColor);

            // Draw image of next page
            pdImage = LosslessFactory.createFromImage(pdfDocIn, pageImages_List.get(currentWord.page-1));
            contentStream.drawImage(pdImage, 0, 0, PDRectangle.A4.getWidth(), PDRectangle.A4.getHeight());
        }

        // Draw bounding box
        contentStream.addRect(currentWord.bBox);
        contentStream.stroke();

        // Draw extracted text
        contentStream.beginText();
        for (int c = 0; c < currentWord.word.size(); c++) {
            TextPosition currChar = currentWord.word.get(c);
            PDFont wordFont = currChar.getFont();
            contentStream.setNonStrokingColor(_ocrColor);
            contentStream.setFont(wordFont, 1.0f);
            contentStream.setTextMatrix(currChar.getTextMatrix());

            // Sometimes there are unprintable symbols...
            try {
                contentStream.showText(currChar.toString());
            } catch (IllegalArgumentException iEx) {
                __log.debug("Non-supported character [IllegalArgumentException]: " + currChar.toString());
                contentStream.setNonStrokingColor(_warnColor);
                contentStream.setFont(_font, 1.0f);
                contentStream.showText("[?]");
            } catch (NullPointerException nEx) {
                __log.debug("Non-supported character [NullPointerException]: " + currChar.toString());
                contentStream.setNonStrokingColor(_warnColor);
                contentStream.setFont(_font, 1.0f);
                contentStream.showText("[?]");
            } catch (UnsupportedOperationException usEx) {
                __log.debug("Non-supported character [UnsupportedOperationException]: " + currChar.toString());
                contentStream.setNonStrokingColor(_warnColor);
                contentStream.setFont(_font, 1.0f);
                contentStream.showText("[?]");
            }
        }
        contentStream.endText();
    }
    contentStream.close();

    String outputPath = _outputFolder + getFileNameWithoutExt(_inputFileName_PDF) + "_ocr.pdf";
    __log.info("Trying to save reconstructed file to: " + outputPath);
    pdDocument.save(outputPath);

    pdDocument.close();
}
受保护的void printOCROverlay()引发IOException{
if(_listOfWords==null | |(u listOfWords.isEmpty()){
__log.error(“PDF中没有单词”);
返回;
}
//以图像形式获取所有页面
PDDocument pdfDocIn=PDDocument.load(新文件(_inputFileName_PDF));
PDFRenderer PDFRenderer=新的PDFRenderer(pdfDocIn);
ArrayList pageImages_List=新建ArrayList();
对于(int n=0;n0&¤tWord.page>\u listOfWords.get(i-1.page){
contentStream.close();
//创建新页面,初始化新内容流
第页=新的PDPage(PDA.A4);
pdDocument.addPage(第页);
contentStream=新的PDPageContentStream(pdDocument,page,true,true);
contentStream.setStrokingColor(_lightGrayColor);
//绘制下一页的图像
pdImage=LosslessFactory.createFromImage(pdfDocIn,pageImages_List.get(currentWord.page-1));
drawImage(pdImage,0,0,PDRectangle.A4.getWidth(),PDRectangle.A4.getHeight());
}
//绘制边界框
contentStream.addRect(currentWord.bBox);
contentStream.stroke();
//绘制提取的文本
contentStream.beginText();
对于(int c=0;c
根据堆栈跟踪,您使用的是7月中旬的版本。请获取当前快照或RC1,再试一次,如果仍然发生,请编辑您的问题以显示更新的堆栈跟踪。(我在你的代码中找不到任何错误)哦,也许有一个问题:PDFont-wordFont=currChar.getFont();这是从已经关闭的PDDocument中获取的字体,不是吗?我花了一段时间才从10月中旬开始更新到最新版本(我机器上的防火墙相关问题),抱歉,响应太晚。在这里发布之后,我也稍微意识到了getFont(),并对其进行了修复,但它没有改变任何东西。信息技术