Java JPedal-在PDF中的某个点突出显示单词

Java JPedal-在PDF中的某个点突出显示单词,java,pdf,jpedal,Java,Pdf,Jpedal,我想实现一个功能,允许用户使用JPedal库双击以突出显示PDF文档中的单词。如果我能得到一个单词的边框,看看MouseEvent的位置是否在其中,这将是一件微不足道的事情;以下代码段演示了如何高亮显示区域: private void highlightText() { Rectangle highlightRectangle = new Rectangle(firstPoint.x, firstPoint.y, secondPoint.x - firstPoin

我想实现一个功能,允许用户使用JPedal库双击以突出显示PDF文档中的单词。如果我能得到一个单词的边框,看看MouseEvent的位置是否在其中,这将是一件微不足道的事情;以下代码段演示了如何高亮显示区域:

private void highlightText() {
    Rectangle highlightRectangle = new Rectangle(firstPoint.x, firstPoint.y,
            secondPoint.x - firstPoint.x, secondPoint.y - firstPoint.y);
    pdfDecoder.getTextLines().addHighlights(new Rectangle[]{highlightRectangle}, false, currentPage);
    pdfDecoder.repaint();
}

但是,我只能在文档中找到纯文本提取示例。

在查看了Mark的示例后,我成功地使其正常工作。这里有一些怪癖,所以我将解释它是如何工作的,以防对其他人有所帮助。关键方法是,当给定要从中提取的区域时,返回形式为
{word1,w1_-x1,w1_-y1,w1_-x2,w1_-y2,word2,w2_-x1,…}
列表。下面列出了逐步说明

首先,您需要将
MouseEvent
的组件/屏幕坐标转换为PDF页面坐标,并校正缩放比例:

/**
 * Transforms Component coordinates to page coordinates, correcting for 
 * scaling and panning.
 * 
 * @param x Component x-coordinate
 * @param y Component y-coordinate
 * @return Point on the PDF page
 */
private Point getPageCoordinates(int x, int y) {
    float scaling = pdfDecoder.getScaling();
    int x_offset = ((pdfDecoder.getWidth() - pdfDecoder.getPDFWidth()) / 2); 
    int y_offset = pdfDecoder.getPDFHeight();
    int correctedX = (int)((x - x_offset + viewportOffset.x) / scaling);
    int correctedY = (int)((y_offset - (y + viewportOffset.y))  / scaling);
    return new Point(correctedX, correctedY);
}
接下来,创建一个框来扫描文本。我选择将其设置为页面宽度和垂直+/-20页面单位(这是一个相当任意的数字),以
MouseEvent
为中心:

/**
 * Scans for all the words located with in a box the width of the page and 
 * 40 points high, centered at the supplied point.
 * 
 * @param p Point to centre the scan box around
 * @return  A List of words within the scan box
 * @throws PdfException
 */
private List<String> scanForWords(Point p) throws PdfException {
    List<String> result = Collections.emptyList();
    if (pdfDecoder.getlastPageDecoded() > 0) {
        PdfGroupingAlgorithms currentGrouping = pdfDecoder.getGroupingObject();
        PdfPageData currentPageData = pdfDecoder.getPdfPageData();
        int x1 = currentPageData.getMediaBoxX(currentPage);
        int x2 = currentPageData.getMediaBoxWidth(currentPage) + x1;
        int y1 = p.y + 20;
        int y2 = p.y - 20;
        result = currentGrouping.extractTextAsWordlist(x1, y1, x2, y2, currentPage, true, "");
    }
    return result;
}
然后确定
MouseEvent
所属的
矩形

/**
 * Finds the bounding Rectangle of a word located at a Point.
 * 
 * @param p Point to find word bounds
 * @param wordBounds List of word boundaries to search
 * @return A Rectangle that bounds a word and contains a point, or null if 
 *         there is no word located at the point
 */
private Rectangle findWordBoundsAtPoint(Point p, List<Rectangle> wordBounds) {
    Rectangle result = null;
    for (Rectangle wordBound : wordBounds) {
        if (wordBound.contains(p)) {
            result = wordBound;
            break;
        }
    }
    return result;
}
然后我将其传递给此方法以添加高光:

/**
 * Highlights text on the document
 */
private void highlightText(Rectangle highlightRectangle) {
    pdfDecoder.getTextLines().addHighlights(new Rectangle[]{highlightRectangle}, false, currentPage);
    pdfDecoder.repaint();
}
最后,上述所有调用都打包到这个方便的方法中:

/**
 * Highlights the word at the given point.
 * 
 * @param p Point where word is located
 */
private void highlightWordAtPoint(Point p) {
    try {
        Rectangle wordBounds = findWordBoundsAtPoint(p, parseWordBounds(scanForWords(p)));
        if (wordBounds != null) {
            highlightText(contractHighlight(wordBounds));
        }
    } catch (PdfException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

在看了马克的例子后,我设法让它工作起来。这里有一些怪癖,所以我将解释它是如何工作的,以防对其他人有所帮助。关键方法是,当给定要从中提取的区域时,返回形式为
{word1,w1_-x1,w1_-y1,w1_-x2,w1_-y2,word2,w2_-x1,…}
列表。下面列出了逐步说明

首先,您需要将
MouseEvent
的组件/屏幕坐标转换为PDF页面坐标,并校正缩放比例:

/**
 * Transforms Component coordinates to page coordinates, correcting for 
 * scaling and panning.
 * 
 * @param x Component x-coordinate
 * @param y Component y-coordinate
 * @return Point on the PDF page
 */
private Point getPageCoordinates(int x, int y) {
    float scaling = pdfDecoder.getScaling();
    int x_offset = ((pdfDecoder.getWidth() - pdfDecoder.getPDFWidth()) / 2); 
    int y_offset = pdfDecoder.getPDFHeight();
    int correctedX = (int)((x - x_offset + viewportOffset.x) / scaling);
    int correctedY = (int)((y_offset - (y + viewportOffset.y))  / scaling);
    return new Point(correctedX, correctedY);
}
接下来,创建一个框来扫描文本。我选择将其设置为页面宽度和垂直+/-20页面单位(这是一个相当任意的数字),以
MouseEvent
为中心:

/**
 * Scans for all the words located with in a box the width of the page and 
 * 40 points high, centered at the supplied point.
 * 
 * @param p Point to centre the scan box around
 * @return  A List of words within the scan box
 * @throws PdfException
 */
private List<String> scanForWords(Point p) throws PdfException {
    List<String> result = Collections.emptyList();
    if (pdfDecoder.getlastPageDecoded() > 0) {
        PdfGroupingAlgorithms currentGrouping = pdfDecoder.getGroupingObject();
        PdfPageData currentPageData = pdfDecoder.getPdfPageData();
        int x1 = currentPageData.getMediaBoxX(currentPage);
        int x2 = currentPageData.getMediaBoxWidth(currentPage) + x1;
        int y1 = p.y + 20;
        int y2 = p.y - 20;
        result = currentGrouping.extractTextAsWordlist(x1, y1, x2, y2, currentPage, true, "");
    }
    return result;
}
然后确定
MouseEvent
所属的
矩形

/**
 * Finds the bounding Rectangle of a word located at a Point.
 * 
 * @param p Point to find word bounds
 * @param wordBounds List of word boundaries to search
 * @return A Rectangle that bounds a word and contains a point, or null if 
 *         there is no word located at the point
 */
private Rectangle findWordBoundsAtPoint(Point p, List<Rectangle> wordBounds) {
    Rectangle result = null;
    for (Rectangle wordBound : wordBounds) {
        if (wordBound.contains(p)) {
            result = wordBound;
            break;
        }
    }
    return result;
}
然后我将其传递给此方法以添加高光:

/**
 * Highlights text on the document
 */
private void highlightText(Rectangle highlightRectangle) {
    pdfDecoder.getTextLines().addHighlights(new Rectangle[]{highlightRectangle}, false, currentPage);
    pdfDecoder.repaint();
}
最后,上述所有调用都打包到这个方便的方法中:

/**
 * Highlights the word at the given point.
 * 
 * @param p Point where word is located
 */
private void highlightWordAtPoint(Point p) {
    try {
        Rectangle wordBounds = findWordBoundsAtPoint(p, parseWordBounds(scanForWords(p)));
        if (wordBounds != null) {
            highlightText(contractHighlight(wordBounds));
        }
    } catch (PdfException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}