如何将PDF拆分为多个文档_Pdf_Split_Adobe_Keyword_Acrobat

如何将PDF拆分为多个文档

pdf adobe

如何将PDF拆分为多个文档,pdf,split,adobe,keyword,acrobat,Pdf,Split,Adobe,Keyword,Acrobat,我有一个大的PDF，它是由多个文档组合而成的如何使用关键字分隔符将PDF拆分回多个文档？以及Adobe Reader您将需要Adobe Acrobat 使用操作向导添加以下脚本：粘贴以下脚本并根据需要进行修改。有关自定义的帮助，请参见//注释 /* Extract Pages into Documents by Keyword */ // Iterates over all pages and find a given string and extracts all // pages

我有一个大的PDF，它是由多个文档组合而成的

如何使用关键字分隔符将PDF拆分回多个文档？

以及Adobe Reader您将需要Adobe Acrobat

使用操作向导添加以下脚本：

粘贴以下脚本并根据需要进行修改。有关自定义的帮助，请参见//注释

/* Extract Pages into Documents by Keyword */
// Iterates over all pages and find a given string and extracts all 
// pages on which that string is found to a new file.

var pageArray = [];
var pageArrayEnd = [];

var stringToSearchFor = app.response("This Action Script splits the document by a keyword on each X number of pages, please enter the keyword:");

for (var p = 0; p < this.numPages; p++) {
    // iterate over all words
    for (var n = 0; n < this.getPageNumWords(p); n++) {
    // DEBUGGING HELP, UNCOMMENT NEXT LINE, CHANGE TO MATCH MULTIPLE WORDS OR WHAT EVER ORDER, eg if ((this.getPageNthWord(p, n) == stringToSearchFor) && (this.getPageNthWord(p, n + 1) == stringToSearchForTWO)) {..., Also add a prompt for the second search word and iterate one less for (var n = 0; n < this.getPageNumWords(p) - 1; n++) ...
    //app.alert("Word is " + this.getPageNthWord(p, n));
        if (this.getPageNthWord(p, n) == stringToSearchFor) {
            //app.alert("Found word on page " + p + " word number " + n, 3);
            if (pageArray.length > 0) {
                pageArrayEnd.push(p - 1);
            }
            pageArray.push(p);
            break;
        }
    }
}

pageArrayEnd.push(this.numPages - 1);
//app.alert("Number of sub documents " + pageArray.length, 3);
if (pageArray.length > 0) {
    // extract all pages that contain the string into a new document
    for (var n = 0; n < pageArray.length; n++) {
        var d = app.newDoc();    // this will add a blank page - we need to remove that once we are done
            //app.alert("New Doc using pages " + pageArray[n] + " to " + pageArrayEnd[n], 3);
            d.insertPages( {
                            nPage: d.numPages-1,
                            cPath: this.path,
                            nStart: pageArray[n],
                            nEnd: pageArrayEnd[n],
            } );
        // remove the first page
        d.deletePages(0);
        d.saveAs({ cPath: this.path.replace(".pdf","") + n + ".pdf" });
        d.closeDoc(true);
    }
}

/*按关键字将页面提取到文档中*/
//迭代所有页面，查找给定字符串并提取所有
//在新文件中找到该字符串的页面。
var pageArray=[]；
var pageArrayEnd=[]；
var stringToSearchFor=app.response（“此操作脚本在每X个页面上按关键字拆分文档，请输入关键字：”）；
对于（var p=0；p0）{
pageArrayEnd.push（p-1）；
}
pagerarray.push（p）；
打破
}
}
}
pagerrayend.push（this.numPages-1）；
//应用程序警报（“子文档数”+pageArray.length，3）；
如果（pageArray.length>0）{
//将包含字符串的所有页面提取到新文档中
对于（var n=0；n

请查看本指南：

//用于注册所有DLL程序集。
WorkRegistry.Reset（）；
字符串inputFilePath=Program.RootPath+“\\”+“1.pdf”；
字符串outputFileName=“输出”；
int[]splitIndex=newint[3]{1,3,5}；//每个索引的有效值：1到（页数-1）。
//创建输出PDF文件路径列表
List OutputFilePath=新列表（）；
对于（int i=0；i），您实际上需要Adobe Acrobat。Adobe Reader将无法运行此脚本，因为它无法插入或提取页面。如果您有Acrobat，您也将有Distiller。但是，在这种情况下，我看不到使用Distiller的任何理由。另一件需要记住的事情是“word”PDF中的单词可能与我们在文本中看到的单词不一致。根据创建PDF所用的软件，字距调整可能会将一个单词分解为PDF中的几个“单词”。明天我将发布一个脚本，在控制台上显示页面中的每个getPageNthWord。感谢您的反馈使用app.newDoc（）的步骤可能没有必要；提取第一个找到的页面，然后将页面插入到新文档中就足够了。extractPages（）在成功执行时会返回一个Doc对象。这还可以防止任意页面大小可能出现的问题。最后一条注释……一种“智能搜索”的方法将使用编校函数（也可以使用正则表达式查找复杂单词），然后浏览注释，并从中读出页码。有点绕道，但更强大…只是别忘了不完成编校，而是在完成后丢弃所有注释。
// Used to register all DLL assemblies.
WorkRegistry.Reset();

String inputFilePath = Program.RootPath + "\\" + "1.pdf";
String outputFileName = "Output";
int[] splitIndex = new int[3] { 1, 3, 5 }; // Valid value for each index: 1 to (Page Count - 1).

// Create output PDF file path list
List<String> outputFilePaths = new List<String>();
for (int i = 0; i <= splitIndex.Length; i++)
{
        outputFilePaths.Add(Program.RootPath + "\\" + outputFileName + "_" + i.ToString() + ".pdf");
}

// Split input PDF file to 4 files:
// File 0: page 0.
// File 1: page 1 ~ 2.
// File 2: page 3 ~ 4.
// File 3: page 5 ~ the last page.
PDFDocument.SplitDocument(inputFilePath, splitIndex, outputFilePaths.ToArray());