Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/apache/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 如何使用ApachePDFBox选择PDF文本?_Java_Apache_Pdf_Javafx_Pdfbox - Fatal编程技术网

Java 如何使用ApachePDFBox选择PDF文本?

Java 如何使用ApachePDFBox选择PDF文本?,java,apache,pdf,javafx,pdfbox,Java,Apache,Pdf,Javafx,Pdfbox,我试图在JavaFX上的PDF阅读应用程序中选择文本。我有PDF文件,其中包含带有文本和OCR层的屏幕截图。所以我需要的文本是可选的,就像在普通的观众。我设置从页面获取图像,现在尝试找出如何突出显示文本 我尝试了以下几点: InputStream is = this.getClass().getResourceAsStream(currentPdf); Image convertedImage; try { PDDocument document = PD

我试图在JavaFX上的PDF阅读应用程序中选择文本。我有PDF文件,其中包含带有文本和OCR层的屏幕截图。所以我需要的文本是可选的,就像在普通的观众。我设置从页面获取图像,现在尝试找出如何突出显示文本

我尝试了以下几点:

    InputStream is = this.getClass().getResourceAsStream(currentPdf);
    Image convertedImage;
    try {
        PDDocument document = PDDocument.load(is);
        List<PDPage> list = document.getDocumentCatalog().getAllPages();
        PDPage page = list.get(pageNum);
        List annotations = page.getAnnotations();
        PDAnnotationTextMarkup markup = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
        markup.setRectangle(new PDRectangle(600, 600));
        markup.setQuadPoints(new float[]{100, 100, 200, 100, 100, 500, 200, 500});
        annotations.add(markup);
        page.setAnnotations(annotations);
        BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 128);
        convertedImage = SwingFXUtils.toFXImage(image, null);
        document.close();
        imageView.setImage(convertedImage);
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
question.getSelectedBounds()
                .stream()
                .map(Shape::getBounds)
                .forEach(bounds -> {
                    SquareAnnotation squareAnnotation = (SquareAnnotation)
                            AnnotationFactory.buildAnnotation(
                                    pdfController.getPageTree().getLibrary(),
                                    Annotation.SUBTYPE_SQUARE,
                                    bounds);
                    squareAnnotation.setFillColor(true);
                    squareAnnotation.setFillColor(new Color(255, 250, 57, 120));
                    squareAnnotation.setRectangle(bounds);
                    squareAnnotation.setBBox(bounds);
                    squareAnnotation.resetAppearanceStream(null);
                    AbstractAnnotationComponent annotationComponent = AnnotationComponentFactory
                            .buildAnnotationComponent(squareAnnotation, pdfController.getDocumentViewController(),
                                    pageViewComponent, pdfController.getDocumentViewController().getDocumentViewModel());
                    pageViewComponent.addAnnotation(annotationComponent);
                });
InputStream is=this.getClass().getResourceAsStream(currentPdf);
图像转换图像;
试一试{
PDDocument document=PDDocument.load(is);
列表=document.getDocumentCatalog().getAllPages();
PDPage=list.get(pageNum);
列表注释=page.getAnnotations();
PDAnnotationTextMarkup markup=新的PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
setRectangle(新的PDRectangle(600600));
设定点(新浮点[]{100100200100100500200500});
添加(标记);
第页。设置注释(注释);
BuffereImage image=page.convertToImage(buffereImage.TYPE_INT_RGB,128);
convertedImage=SwingFXUtils.toFXImage(图像,null);
document.close();
设置图像(转换图像);
}捕获(例外e){
抛出新的运行时异常(e);
}
但这会导致图像没有任何高光

我还试图在堆栈溢出或其他资源中查找信息,但没有找到任何内容


如果有一些Java代码示例能够使用鼠标突出显示文本,我将不胜感激。

我使用了ICEpdf并完成了以下工作:

    InputStream is = this.getClass().getResourceAsStream(currentPdf);
    Image convertedImage;
    try {
        PDDocument document = PDDocument.load(is);
        List<PDPage> list = document.getDocumentCatalog().getAllPages();
        PDPage page = list.get(pageNum);
        List annotations = page.getAnnotations();
        PDAnnotationTextMarkup markup = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
        markup.setRectangle(new PDRectangle(600, 600));
        markup.setQuadPoints(new float[]{100, 100, 200, 100, 100, 500, 200, 500});
        annotations.add(markup);
        page.setAnnotations(annotations);
        BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 128);
        convertedImage = SwingFXUtils.toFXImage(image, null);
        document.close();
        imageView.setImage(convertedImage);
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
question.getSelectedBounds()
                .stream()
                .map(Shape::getBounds)
                .forEach(bounds -> {
                    SquareAnnotation squareAnnotation = (SquareAnnotation)
                            AnnotationFactory.buildAnnotation(
                                    pdfController.getPageTree().getLibrary(),
                                    Annotation.SUBTYPE_SQUARE,
                                    bounds);
                    squareAnnotation.setFillColor(true);
                    squareAnnotation.setFillColor(new Color(255, 250, 57, 120));
                    squareAnnotation.setRectangle(bounds);
                    squareAnnotation.setBBox(bounds);
                    squareAnnotation.resetAppearanceStream(null);
                    AbstractAnnotationComponent annotationComponent = AnnotationComponentFactory
                            .buildAnnotationComponent(squareAnnotation, pdfController.getDocumentViewController(),
                                    pageViewComponent, pdfController.getDocumentViewController().getDocumentViewModel());
                    pageViewComponent.addAnnotation(annotationComponent);
                });

我使用了ICEpdf并执行了以下操作:

    InputStream is = this.getClass().getResourceAsStream(currentPdf);
    Image convertedImage;
    try {
        PDDocument document = PDDocument.load(is);
        List<PDPage> list = document.getDocumentCatalog().getAllPages();
        PDPage page = list.get(pageNum);
        List annotations = page.getAnnotations();
        PDAnnotationTextMarkup markup = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
        markup.setRectangle(new PDRectangle(600, 600));
        markup.setQuadPoints(new float[]{100, 100, 200, 100, 100, 500, 200, 500});
        annotations.add(markup);
        page.setAnnotations(annotations);
        BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 128);
        convertedImage = SwingFXUtils.toFXImage(image, null);
        document.close();
        imageView.setImage(convertedImage);
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
question.getSelectedBounds()
                .stream()
                .map(Shape::getBounds)
                .forEach(bounds -> {
                    SquareAnnotation squareAnnotation = (SquareAnnotation)
                            AnnotationFactory.buildAnnotation(
                                    pdfController.getPageTree().getLibrary(),
                                    Annotation.SUBTYPE_SQUARE,
                                    bounds);
                    squareAnnotation.setFillColor(true);
                    squareAnnotation.setFillColor(new Color(255, 250, 57, 120));
                    squareAnnotation.setRectangle(bounds);
                    squareAnnotation.setBBox(bounds);
                    squareAnnotation.resetAppearanceStream(null);
                    AbstractAnnotationComponent annotationComponent = AnnotationComponentFactory
                            .buildAnnotationComponent(squareAnnotation, pdfController.getDocumentViewController(),
                                    pageViewComponent, pdfController.getDocumentViewController().getDocumentViewModel());
                    pageViewComponent.addAnnotation(annotationComponent);
                });

请上传PDF。这是一个示例,好的是它确实有文本。在PDFBOX2.0中,有一个工具DrawPrintTextLocations.java,请尝试一下。您的问题还不清楚,您是想要一个具有文本标记功能的查看器,还是想要突出显示内容然后保存PDF?基本上,您将PDF绘制到位图图像(这样会丢失所有信息,哪些像素是文本,哪些像素不是文本)并显示该图像。因此,您需要告诉javafx文本所在的位置。@Polyakoff还可以查看ExtractTextByArea.java示例,这将从选定区域获取文本。请上传PDF。这是示例,好的是它确实有文本。在PDFBOX2.0中,有一个工具DrawPrintTextLocations.java,请尝试一下。您的问题还不清楚,您是想要一个具有文本标记功能的查看器,还是想要突出显示内容然后保存PDF?基本上,您将PDF绘制到位图图像(这样会丢失所有信息,哪些像素是文本,哪些像素不是文本)并显示该图像。因此,您需要告诉javafx文本在哪里。@如果您还看一看ExtractTextByArea.java示例,这将从选定区域获取文本。