Java 如何将pdf格式的图像坐标输入JSONfile？_Java_Image_Pdf_Pdfbox

Java 如何将pdf格式的图像坐标输入JSONfile？

java image pdf

Java 如何将pdf格式的图像坐标输入JSONfile？,java,image,pdf,pdfbox,Java,Image,Pdf,Pdfbox,我已经编写了创建html页面的代码，包括在pdf文档中提取页面的图像我曾尝试从pdf中提取图像，然后我成功地从pdf中提取图像，并使用PDFBox lib将图像应用到html页面。但我并没有在html页面中提取图像坐标所以我研究了如何提取pdf中的图像坐标，我尝试使用PDFBox库提取pdf中的图像坐标以下代码： public static void main(String[] args) throws Exception { try { PDDocumen

我已经编写了创建html页面的代码，包括在pdf文档中提取页面的图像

我曾尝试从pdf中提取图像，然后我成功地从pdf中提取图像，并使用PDFBox lib将图像应用到html页面。但我并没有在html页面中提取图像坐标

所以我研究了如何提取pdf中的图像坐标，我尝试使用PDFBox库提取pdf中的图像坐标

以下代码：

public static void main(String[] args) throws Exception
{
    try
    {
        PDDocument document = PDDocument.load(
            "/Users/tmdtjq/Downloads/PDFTest/test.pdf" );

        PrintImageLocations printer = new PrintImageLocations();
        List allPages = document.getDocumentCatalog().getAllPages();
        for( int i=0; i<allPages.size(); i++ )
        {
            PDPage page = (PDPage)allPages.get( i );
            int pageNum = i+1;
            System.out.println( "Processing page: " + pageNum );
            printer.processStream( page, page.findResources(),
                page.getContents().getStream() );
        }
    }
    finally
    {
    }
}

protected void processOperator( PDFOperator operator, List arguments ) throws IOException
{
    String operation = operator.getOperation();
    if( operation.equals( "Do" ) )
    {
        COSName objectName = (COSName)arguments.get( 0 );
        Map xobjects = getResources().getXObjects();
        PDXObject xobject = xobjects.get( objectName.getName() );
        if( xobject instanceof PDXObjectImage )
        {
            try
            {
                PDXObjectImage image = (PDXObjectImage)xobject;
                PDPage page = getCurrentPage();
                Matrix ctm = getGraphicsState().getCurrentTransformationMatrix();
                double rotationInRadians =(page.findRotation() * Math.PI)/180;

                AffineTransform rotation = new AffineTransform();
                rotation.setToRotation( rotationInRadians );
                AffineTransform rotationInverse = rotation.createInverse();
                Matrix rotationInverseMatrix = new Matrix();
                rotationInverseMatrix.setFromAffineTransform( rotationInverse );
                Matrix rotationMatrix = new Matrix();
                rotationMatrix.setFromAffineTransform( rotation );

                Matrix unrotatedCTM = ctm.multiply( rotationInverseMatrix );
                float xScale = unrotatedCTM.getXScale();
                float yScale = unrotatedCTM.getYScale();
                float xPosition = unrotatedCTM.getXPosition();
                float yPosition = unrotatedCTM.getYPosition();

                System.out.println( "Found image[" + objectName.getName() + "] " +
                    "at " + xPosition + "," + yPosition +
                    " size=" + (xScale/100f*image.getWidth()) + "," + (yScale/100f*image.getHeight() ));
            }
            catch( NoninvertibleTransformException e )
            {
                throw new WrappedIOException( e );
            }
        }
    }
}

publicstaticvoidmain（字符串[]args）引发异常
{
尝试
{
PDDocument文件=PDDocument.load(
“/Users/tmdtjq/Downloads/PDFTest/test.pdf”）；
PrintImageLocations打印机=新的PrintImageLocations（）；
List allPages=document.getDocumentCatalog（）.getAllPages（）；
对于（int i=0；ii），我能够通过搜索cm操作符找到图像。
我以以下方式重写了PDFTextStripper：
注意：它不考虑旋转和镜像
public static class TextFinder extends PDFTextStripper {

    public TextFinder() throws IOException {
        super();
    }

    @Override
    protected void startPage(PDPage page) throws IOException {
        // process start of the page
        super.startPage(page);
    }

    @Override
    public void process(PDFOperator operator, List<COSBase> arguments)
            throws IOException {

        if ("cm".equals(operator.getOperation())) {
            float width = ((COSNumber)arguments.get(0)).floatValue();
            float height = ((COSNumber)arguments.get(3)).floatValue();
            float x = ((COSNumber)arguments.get(4)).floatValue();
            float y = ((COSNumber)arguments.get(5)).floatValue();
            // process image coordinates
        }
        super.processOperator(operator, arguments);
    }

    @Override
    protected void writeString(String text,
            List<TextPosition> textPositions) throws IOException {
        for (TextPosition position : textPositions) {
            // process text coordinates
        }
        super.writeString(text, textPositions);
    }
}

公共静态类TextFinder扩展了PDFTextStripper{
public TextFinder（）引发IOException{
超级（）；
}
@凌驾
受保护的无效起始页（PDPage页）引发IOException{
//处理页面的开始部分
超级起始页（第页）；
}
@凌驾
公共无效进程（PDFOOperator运算符，列表参数）
抛出IOException{
if（“cm”.equals（operator.getOperation（）））{
float width=（（COSNumber）参数.get（0））.floatValue（）；
浮点高度=（（COSNumber）参数.get（3））.floatValue（）；
float x=（（COSNumber）参数.get（4））.floatValue（）；
float y=（（COSNumber）参数.get（5））.floatValue（）；
//处理图像坐标
}
super.processOperator（运算符、参数）；
}
@凌驾
受保护的void writeString（字符串文本，
列表文本位置）引发IOException{
for（TextPosition位置：textPositions）{
//处理文本坐标
}
super.writeString（文本、文本位置）；
}
}

当然，如果对查找文本和图像不感兴趣，可以使用PDFStreamEngine
而不是PDFTextStripper
。
如果所有位置都报告为（0,0），那是因为原点已被转换。请尝试。