使用ITextSharp编辑PDF中的超链接和锚定

使用ITextSharp编辑PDF中的超链接和锚定,pdf,hyperlink,itext,editing,Pdf,Hyperlink,Itext,Editing,我正在使用iTextSharp库和C#.Net来拆分我的PDF文件 考虑一个名为sample.PDF的PDF文件,其中包含72页。此sample.pdf包含具有导航到其他页面的超链接的页面。例如:在第4页中有三个超链接,单击后可导航到相应的第24、27、28页。与第4页相同,有近12页有此超链接 现在使用iTextSharp库,我将这些PDF页面分割成72个单独的文件,并以1.PDF、2.PDF…72.PDF的名称保存。因此,在4.pdf中,当单击超链接时,我需要使pdf导航到24.pdf、27

我正在使用iTextSharp库和C#.Net来拆分我的PDF文件

考虑一个名为sample.PDF的PDF文件,其中包含72页。此sample.pdf包含具有导航到其他页面的超链接的页面。例如:在第4页中有三个超链接,单击后可导航到相应的第24、27、28页。与第4页相同,有近12页有此超链接

现在使用iTextSharp库,我将这些PDF页面分割成72个单独的文件,并以1.PDF、2.PDF…72.PDF的名称保存。因此,在4.pdf中,当单击超链接时,我需要使pdf导航到24.pdf、27.pdf、28.pdf

请帮帮我。如何编辑和设置4.pdf中的超链接,使其导航到相应的pdf文件

谢谢,,
Ashok

你想要什么是完全可能的。您需要使用低级PDF对象(PdfDictionary、PdfArray等)

每当有人需要使用这些对象时,我总是将它们引用到。在您的案例中,您需要检查第7章(特别是第3节)和第12章,第3节(文档级导航)和第5节(注释)

假设你已经读过,下面是你需要做的:

  • 逐步浏览每个页面的注释数组(在原始文档中,在拆分之前)。
  • 查找所有链接注释及其目的地
  • 为对应于新文件的链接构建新目标
  • 将新目标写入链接注释
  • 使用PdfCopy将此页面写入新的PDF(它将复制注释以及页面内容)
  • 步骤1.1并不简单。有几种不同的“本地转到”注释格式。您需要确定给定链接指向哪个页面。一些链接可能会说PDF相当于“下一页”或“上一页”,而其他链接则会包括对特定页面的引用。这将是一个“间接对象引用”,而不是页码

    要根据页面引用确定页码,您需要。。。哎哟可以最有效的方法是为原始文档中的每个页面调用PdfReader.GetPageRef(intPageNum),并将其缓存在映射中(reference->pageNum)

    然后,您可以通过创建一个远程转到PDP来构建“远程转到”链接,并将其写入链接注释的“a”(action)条目,删除之前存在的任何内容(可能是“Dest”)


    我的C语言说得不太好,所以我将把实际实现留给您。

    好的,根据@Mark Storer的内容,这里有一些起始代码。第一种方法创建了一个示例PDF,其中包含10个页面和第一个页面上的一些链接,这些链接可以跳转到PDF的不同部分,因此我们可以使用这些链接。第二个方法打开在第一个方法中创建的PDF,遍历每个注释,试图找出注释链接到哪个页面,并将其输出到跟踪窗口。代码是用VB编写的,但如果需要,可以很容易地转换成C。它的目标是iTextSharp 5.1.1.0

    如果我有机会的话,我可能会尝试更进一步,并实际分割和重新链接的东西,但我现在没有时间

    Option Explicit On
    Option Strict On
    
    Imports iTextSharp.text
    Imports iTextSharp.text.pdf
    Imports System.IO
    
    Public Class Form1
        ''//Folder that we are working in
        Private Shared ReadOnly WorkingFolder As String = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Hyperlinked PDFs")
        ''//Sample PDF
        Private Shared ReadOnly BaseFile As String = Path.Combine(WorkingFolder, "Sample.pdf")
    
        Private Shared Sub CreateSamplePdf()
            ''//Create our output directory if it does not exist
            Directory.CreateDirectory(WorkingFolder)
    
            ''//Create our sample PDF
            Using Doc As New iTextSharp.text.Document(PageSize.LETTER)
                Using FS As New FileStream(BaseFile, FileMode.Create, FileAccess.Write, FileShare.Read)
                    Using writer = PdfWriter.GetInstance(Doc, FS)
                        Doc.Open()
    
                        ''//Turn our hyperlinks blue
                        Dim BlueFont As Font = FontFactory.GetFont("Arial", 12, iTextSharp.text.Font.NORMAL, iTextSharp.text.BaseColor.BLUE)
    
                        ''//Create 10 pages with simple labels on them
                        For I = 1 To 10
                            Doc.NewPage()
                            Doc.Add(New Paragraph(String.Format("Page {0}", I)))
                            ''//On the first page add some links
                            If I = 1 Then
    
                                ''//Go to pages relative to this page
                                Doc.Add(New Paragraph(New Chunk("First Page", BlueFont).SetAction(New PdfAction(PdfAction.FIRSTPAGE))))
    
                                Doc.Add(New Paragraph(New Chunk("Next Page", BlueFont).SetAction(New PdfAction(PdfAction.NEXTPAGE))))
    
                                Doc.Add(New Paragraph(New Chunk("Prev Page", BlueFont).SetAction(New PdfAction(PdfAction.PREVPAGE)))) ''//This one does not make sense but is here for completeness
    
                                Doc.Add(New Paragraph(New Chunk("Last Page", BlueFont).SetAction(New PdfAction(PdfAction.LASTPAGE))))
    
                                ''//Go to a specific hard-coded page number
                                Doc.Add(New Paragraph(New Chunk("Go to page 5", BlueFont).SetAction(PdfAction.GotoLocalPage(5, New PdfDestination(0), writer))))
                            End If
                        Next
                        Doc.Close()
                    End Using
                End Using
            End Using
        End Sub
        Private Shared Sub ListPdfLinks()
    
            ''//Setup some variables to be used later
            Dim R As PdfReader
            Dim PageCount As Integer
            Dim PageDictionary As PdfDictionary
            Dim Annots As PdfArray
    
            ''//Open our reader
            R = New PdfReader(BaseFile)
            ''//Get the page cont
            PageCount = R.NumberOfPages
    
            ''//Loop through each page
            For I = 1 To PageCount
                ''//Get the current page
                PageDictionary = R.GetPageN(I)
    
                ''//Get all of the annotations for the current page
                Annots = PageDictionary.GetAsArray(PdfName.ANNOTS)
    
                ''//Make sure we have something
                If (Annots Is Nothing) OrElse (Annots.Length = 0) Then Continue For
    
                ''//Loop through each annotation
                For Each A In Annots.ArrayList
    
                    ''//I do not completely understand this but I think this turns an Indirect Reference into an actual object, but I could be wrong
                    ''//Anyway, convert the itext-specific object as a generic PDF object
                    Dim AnnotationDictionary = DirectCast(PdfReader.GetPdfObject(A), PdfDictionary)
    
                    ''//Make sure this annotation has a link
                    If Not AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK) Then Continue For
    
                    ''//Make sure this annotation has an ACTION
                    If AnnotationDictionary.Get(PdfName.A) Is Nothing Then Continue For
    
                    ''//Get the ACTION for the current annotation
                    Dim AnnotationAction = DirectCast(AnnotationDictionary.Get(PdfName.A), PdfDictionary)
    
                    ''//Test if it is a named actions such as /FIRST, /LAST, etc
                    If AnnotationAction.Get(PdfName.S).Equals(PdfName.NAMED) Then
                        Trace.Write("GOTO:")
                        If AnnotationAction.Get(PdfName.N).Equals(PdfName.FIRSTPAGE) Then
                            Trace.WriteLine(1)
                        ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.NEXTPAGE) Then
                            Trace.WriteLine(Math.Min(I + 1, PageCount)) ''//Any links that go past the end of the document should just go to the last page
                        ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.LASTPAGE) Then
                            Trace.WriteLine(PageCount)
                        ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.PREVPAGE) Then
                            Trace.WriteLine(Math.Max(I - 1, 1)) ''//Any links the go before the first page should just go to the first page
                        End If
    
    
                        ''//Otherwise see if its a GOTO page action
                    ElseIf AnnotationAction.Get(PdfName.S).Equals(PdfName.GOTO) Then
    
                        ''//Make sure that it has a destination
                        If AnnotationAction.GetAsArray(PdfName.D) Is Nothing Then Continue For
    
                        ''//Once again, not completely sure if this is the best route but the ACTION has a sub DESTINATION object that is an Indirect Reference.
                        ''//The code below gets that IR, asks the PdfReader to convert it to an actual page and then loop through all of the pages
                        ''//to see which page the IR points to. Very inneficient but I could not find a way to get the page number based on the IR.
    
                        ''//AnnotationAction.GetAsArray(PdfName.D) gets the destination
                        ''//AnnotationAction.GetAsArray(PdfName.D).ArrayList(0) get the indirect reference part of the destination (.ArrayList(1) has fitting options)
                        ''//DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference) turns it into a PRIndirectReference
                        ''//The full line gets us an actual page object (actually I think it could be any type of pdf object but I have not tested that).
                        ''//BIG NOTE: This line really should have a bunch more sanity checks in place
                        Dim AnnotationReferencedPage = PdfReader.GetPdfObject(DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference))
                        Trace.Write("GOTO:")
                        ''//Re-loop through all of the pages in the main document comparing them to this page
                        For J = 1 To PageCount
                            If AnnotationReferencedPage.Equals(R.GetPageN(J)) Then
                                Trace.WriteLine(J)
                                Exit For
                            End If
                        Next
                    End If
                Next
            Next
        End Sub
    
        Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
            CreateSamplePdf()
            ListPdfLinks()
            Me.Close()
        End Sub
    End Class
    

    下面的此功能使用iTextSharp来:

  • 打开PDF
  • 翻阅PDF
  • 检查每页上的注释是否为锚定
  • 第四步是在这里插入您想要的任何逻辑。。。更新链接,记录它们,等等

        /// <summary>Inspects PDF files for internal links.
        /// </summary>
        public static void FindPdfDocsWithInternalLinks()
        {
            foreach (var fi in PdfFiles) {
                try {
                    var reader = new PdfReader(fi.FullName);
                    // Pagination
                    for(var i = 1; i <= reader.NumberOfPages; i++) {
                        var pageDict = reader.GetPageN(i);
                        var annotArray = (PdfArray)PdfReader.GetPdfObject(pageDict.Get(PdfName.ANNOTS));
                        if (annotArray == null) continue;
                        if (annotArray.Length <= 0) continue;
                        // check every annotation on the page
                        foreach (var annot in annotArray.ArrayList) {
                            var annotDict = (PdfDictionary)PdfReader.GetPdfObject(annot);
                            if (annotDict == null) continue;
                            var subtype = annotDict.Get(PdfName.SUBTYPE).ToString();
                            if (subtype != "/Link") continue;
                            var linkDict = (PdfDictionary)annotDict.GetDirectObject(PdfName.A);
                            if (linkDict == null) continue;
                            // if it makes it this far, its an Anchor annotation
                            // so we can grab it's URI
                            var sUri = linkDict.Get(PdfName.URI).ToString();
                            if (String.IsNullOrEmpty(sUri)) continue;
                        }
                    }
                    reader.Close();
                }
                catch (InvalidPdfException e)
                {
                    if (!fi.FullName.Contains("_vti_cnf"))
                        Console.WriteLine("\r\nInvalid PDF Exception\r\nFilename: " + fi.FullName + "\r\nException:\r\n" + e);
                    continue;
                }
                catch (NullReferenceException e) 
                {
                    if (!fi.FullName.Contains("_vti_cnf"))
                        Console.WriteLine("\r\nNull Reference Exception\r\nFilename: " + fi.Name + "\r\nException:\r\n" + e);
                    continue;
                }
            }
    
            // DO WHATEVER YOU WANT HERE
        }
    
    ///检查PDF文件的内部链接。
    /// 
    公共静态void FindPdfDocsWithInternalLinks()
    {
    foreach(Pdfiles中的var fi){
    试一试{
    变量读取器=新的PDF读取器(fi.FullName);
    //分页
    
    对于(var i=1;i Hi,Mark感谢您的帮助。我正在分析文档。您能提供一个示例代码吗?因为我需要尽快完成并交付。@MarkStorer-步骤1.1是我丢失的地方,PDF参考没有多大帮助,因为iTextSharp的对象和糟糕的文档。我可以找到所有的
    子类型=/link
    注释ns,但有不同的类型,它们的键/元素也不同。我在这里编辑并扩展了这个问题: