Java 如何将XMP元数据嵌入到多页PDF/A3文件中？_Java_Itext_Metadata_Xmp_Pdfa

Java 如何将XMP元数据嵌入到多页PDF/A3文件中？

java itext

Java 如何将XMP元数据嵌入到多页PDF/A3文件中？,java,itext,metadata,xmp,pdfa,Java,Itext,Metadata,Xmp,Pdfa,我目前正在从事一个项目，这是一个TIFF到PDF格式的转换器。它获取一系列扫描的TIFF集合文件，并将其转换为单个多页PDF/A3文件。我完成了项目的这一部分，现在专注于元数据处理问题我的老板希望我将每个TIFF的元数据嵌入到PDF文件的每个对应页面中。我不知道怎么做。根据我对PDF/A元数据结构的研究，PDF中似乎只有一个xmp文件，如果我想嵌入matadata的某个页面，我必须给出一个指针，将其指向我想要的位置。在我的项目中，到目前为止我的基本想法是，我应该从每个TIFF文件中提取元数据（

我目前正在从事一个项目，这是一个TIFF到PDF格式的转换器。它获取一系列扫描的TIFF集合文件，并将其转换为单个多页PDF/A3文件。我完成了项目的这一部分，现在专注于元数据处理问题

我的老板希望我将每个TIFF的元数据嵌入到PDF文件的每个对应页面中。我不知道怎么做。根据我对PDF/A元数据结构的研究，PDF中似乎只有一个xmp文件，如果我想嵌入matadata的某个页面，我必须给出一个指针，将其指向我想要的位置。在我的项目中，到目前为止我的基本想法是，我应该从每个TIFF文件中提取元数据（我知道如何执行此步骤），将所有这些内容合并并转换为PDF文件。我试图使用iText，但它似乎不支持这样做

有人知道怎么做吗？有没有一个开放的工具可以这样做？我的主要语言是Java

谢谢大家

你的研究是正确的

主要是因为区分属于整个pdf文档的元数据和属于TIFF图像的元数据很重要。第一种方法实际上仅限于每个pdf的一个实例。第二个是独立的pdf元数据，但可以作为文件附件添加，这是pdf/a-3标准允许的。这两种类型都独立于任何页面，因此从这个意义上说，您老板的请求表明您对pdf格式缺乏了解

但是，您可以在每个Tiff上放置指向其metada的链接注释，可以选择存储在第二个pdf文件中，从而产生数据以某种方式出现在页面上的错觉

现在，我必须尊重地不同意你的说法，即iText没有为你提供解决这个问题的工具。l处理PDF/A-X的创建，包括嵌入文件。PDF/A-3是第三个例子

至于链接注释，它们可能需要一些Pdf规范（嵌入式Go-To操作）和iText的低级操作方法的知识。我现在还没有现成的例子，但我会看看是否可以制作一些东西，然后将其添加到这个答案中

编辑：好吧，这是令人失望的，无论是Foxit还是Adobe的读者都不支持嵌入式go-to-actions。不过，如果您感兴趣，下面是我使用iText7创建符合PDF/a-3的文档时使用的代码，将元数据作为单独的PDF添加

public static String INTENT = "src/test/resources/StackOverflow/EmbeddedLinking/sRGB_CS_profile.icm";
public static String IMG = "src/test/resources/StackOverflow/EmbeddedLinking/itis.jpg";
public static String META = "target/output/StackOverflow/EmbeddedLinking/metadata.pdf";
public static String DEST = "target/output/StackOverFlow/EmbeddedLinking/embeddedMetaData.pdf";

public static void main(String[] args) throws IOException, java.io.IOException {
    File file = new File(DEST);
    file.getParentFile().mkdirs();
    new EmbeddedLinking().createPdf(META);
    new EmbeddedLinking().createPdfWithEmbeddedFile(DEST,META,IMG,INTENT);
}

public void createPdf(String dest) throws IOException, FileNotFoundException{
    PdfWriter writer = new PdfWriter(dest);
    PdfDocument pdfDoc = new PdfDocument(writer);
    Document doc = new Document(pdfDoc);
    //Put some data here
    doc.add(new Paragraph("This is the metadata"));
    doc.add(new Paragraph("The Cake is Lie"));
    doc.add(new Paragraph("42"));
    doc.add(new Paragraph("The Spice must flow"));
    doc.close();
}

public void createPdfWithEmbeddedFile(String dest, String embeddedPath, String imgPath, String intent) throws java.io.IOException {
    PdfWriter writer = new PdfWriter(dest);
    PdfOutputIntent outputIntent = new PdfOutputIntent("Custom", "","http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(intent));
    PdfADocument pdfADoc = new PdfADocument(writer, PdfAConformanceLevel.PDF_A_3A,outputIntent);

    //Setting some required parameters
    pdfADoc.setTagged();
    pdfADoc.getCatalog().setLang(new PdfString("en-US"));
    pdfADoc.getCatalog().setViewerPreferences(
            new PdfViewerPreferences().setDisplayDocTitle(true));
    PdfDocumentInfo info = pdfADoc.getDocumentInfo();
    info.setTitle("iText7 PDF/A-3 Embedded Go-To example");

    //Add attachment
    PdfDictionary parameters = new PdfDictionary();
    parameters.put(PdfName.ModDate, new PdfDate().getPdfObject());
    PdfFileSpec fileSpec = PdfFileSpec.createEmbeddedFileSpec(
            pdfADoc, Files.readAllBytes(Paths.get(embeddedPath)), "metadata.pdf",
            "metadata.pdf", new PdfName("application/pdf"), parameters,
            PdfName.Data, false);
    fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));
    pdfADoc.addFileAttachment("metadata.pdf", fileSpec);
    PdfArray array = new PdfArray();
    array.add(fileSpec.getPdfObject().getIndirectReference());
    pdfADoc.getCatalog().put(new PdfName("AF"), array);

    //Add Image
    int imagePage = 1; //We know the image will end up on the first page since it's the only thing we add to the document
    Document doc = new Document(pdfADoc, PageSize.A4);
    Image img = new Image(ImageDataFactory.create(imgPath));
    doc.add(img);


    //Add link annotation to embedded file
    float pageHeight = PageSize.A4.getHeight();
    float imageWidth = img.getImageWidth();
    float imageHeight = img.getImageHeight();
    float x = doc.getLeftMargin();
    float y = pageHeight - doc.getTopMargin() - imageHeight;
    Rectangle linkAnnotationPosition = new Rectangle(x,y,imageWidth,imageHeight);

    PdfLinkAnnotation linkAnnotation = new PdfLinkAnnotation(linkAnnotationPosition);
    //Setup the Embedded GoTO action
    PdfExplicitDestination explicitDestination = PdfExplicitDestination.createFit(imagePage);//Destination in the target file
    PdfTargetDictionary targetDictionary = PdfTargetDictionary.createChildTarget("metadata.pdf"); //Target embedded file
    PdfAction action = PdfAction.createGoToE(fileSpec,explicitDestination,true,targetDictionary);
    linkAnnotation.setAction(action);
    //PDF/A requires the presence of the F -bit flag array in every dictionary. The print flag needs to be 1, and some other flags 0.
    //See the spec for details and options, but the bit pattern represented by the integer 4 suffices for conformance to PDF/A-3
    int fBitArray = 4;
    linkAnnotation.put(PdfName.F,new PdfNumber(fBitArray));
    //Add annotation to page
    pdfADoc.getPage(imagePage).addAnnotation(linkAnnotation);

    //Close document
    doc.close();
}

“属于pdf文件[…]的元数据实际上仅限于每个pdf一个实例。”-是吗？根据ISO 32000-1，任何PDF流或字典都可能附带元数据（第14.3.2节），我在ISO 19005-3中没有发现对此的限制，相反，它要求PDF中存在的所有元数据流都应符合XMP规范（第6.6.2.1节），这意味着可能存在多个元数据流。或者您的意思是可能只有一个元数据实例引用整个文档？那么您是对的。@mkl后者，作为一个整体，只属于文档的元数据的一个实例。我将编辑答案，因为措辞确实有点模棱两可。“根据我对PDF/a元数据结构的研究，似乎PDF中应该只有一个xmp文件”——通常只有一个元数据流与整个文档关联。不过，还有更多，请看我对塞缪尔回答的评论。