从portfolio pdf java中提取文件夹_Pdf_Itext_Directory_Portfolio

从portfolio pdf java中提取文件夹

pdf itext directory

从portfolio pdf java中提取文件夹,pdf,itext,directory,portfolio,Pdf,Itext,Directory,Portfolio,我有一个文件夹、子文件夹和文件的公文包pdf。我需要使用java中的iText提取与文件夹、子文件夹和文件相同的结构。我只得到嵌入文件的文件。获取文件夹的方法也是什么请找到我正在使用的代码。此代码仅提供文件夹中存在的文件 public static void extractAttachments(String src, String dir) throws IOException { File folder = new File(dir); folder.mkd

我有一个文件夹、子文件夹和文件的公文包pdf。我需要使用java中的iText提取与文件夹、子文件夹和文件相同的结构。我只得到嵌入文件的文件。获取文件夹的方法也是什么

请找到我正在使用的代码。此代码仅提供文件夹中存在的文件

public static void extractAttachments(String src, String dir) throws         IOException
{
    File folder = new File(dir);
    folder.mkdirs();

    PdfReader reader = new PdfReader(src);

    PdfDictionary root = reader.getCatalog();

    PdfDictionary names = root.getAsDict(PdfName.NAMES);
    System.out.println(""+names.getKeys().toString());
    PdfDictionary embedded = names.getAsDict(PdfName.EMBEDDEDFILES);
    System.out.println(""+embedded.toString());

    PdfArray filespecs = embedded.getAsArray(PdfName.NAMES);

    System.out.println(filespecs.getAsString(root1));
    for (int i = 0; i < filespecs.size();)
    {
        extractAttachment(reader, folder, filespecs.getAsString(i++),
                filespecs.getAsDict(i++));
    }
}

protected static void extractAttachment(PdfReader reader, File dir, PdfString name, PdfDictionary filespec)
        throws IOException
{
    PRStream stream;
    FileOutputStream fos;
    String filename;
    PdfArray parent;
    PdfDictionary refs = filespec.getAsDict(PdfName.EF);
    //System.out.println(""+refs.getKeys().toString());

    for (Object key : refs.getKeys())
    {
        stream = (PRStream)         PdfReader.getPdfObject(refs.getAsIndirectObject((PdfName) key));

        filename = filespec.getAsString((PdfName) key).toString();

        // System.out.println("" + filename);
        fos = new FileOutputStream(new File(dir, filename));
        fos.write(PdfReader.getStreamBytes(stream));
        fos.flush();
        fos.close();
    }
}

publicstaticvoidextractAttachments（stringsrc，stringdir）抛出IOException
{
文件夹=新文件（目录）；
folder.mkdirs（）；
PdfReader读取器=新PdfReader（src）；
PdfDictionary root=reader.getCatalog（）；
PdfDictionary name=root.getAsDict（PdfName.names）；
System.out.println（“+names.getKeys（）.toString（））；
PdfDictionary embedded=names.getAsDict（PdfName.EMBEDDEDFILES）；
System.out.println（“+embedded.toString（））；
PdfArray filespecs=embedded.getAsArray（PdfName.NAMES）；
System.out.println（filespecs.getAsString（root1））；
对于（int i=0；i

在中指定了OP在提取公文包文件时尝试复制的文件夹结构。因此，它不是当前PDF标准的一部分，因此，不需要PDF处理软件来理解此类信息。不过，它看起来像是计划加入即将发布的PDF-2（ISO 32000-2）标准

因此，要将公文包文件提取到关联的文件夹结构中，我们必须检索Adobe®补充文件中指定的文件夹信息：

从扩展级别3开始，可移植集合可以包含一个文件夹对象，用于将文件组织成层次结构。该结构由具有单个根文件夹的树表示充当集合中所有其他文件夹和文件的共同祖先。单个根文件夹是参考第29页表8.6的

文件夹

条目

表8.6c描述了文件夹字典中的条目

```
ID
```
integer（必需；扩展级别3）非负整数值表示唯一的文件夹标识号。两个文件夹不得共享相同的
```
ID
```
值
文件夹
```
ID
```
值显示为任何文件的名称树键的一部分与此文件夹关联。详细描述在此表之后可以找到文件夹和文件之间的关联
```
Name
```
text string（必选；扩展级别3）表示文件名的文件名文件夹。两个同级文件夹不得共享同一名称以下是案例规范化
```
Child
```
dictionary（如果文件夹有任何子代，则需要；扩展级别3）一个间接引用此文件夹的第一个子文件夹
```
Next
```
dictionary（每个级别上除最后一项外的所有项目都需要；ExtensionLevel 3）一个间接引用此级别的下一个同级文件夹

（第8.2.4节收集）

例如，像这样：

static Map<Integer, File> retrieveFolders(PdfReader reader, File baseDir) throws DocumentException
{
    Map<Integer, File> result = new HashMap<Integer, File>();

    PdfDictionary root = reader.getCatalog();
    PdfDictionary collection = root.getAsDict(PdfName.COLLECTION);
    if (collection == null)
        throw new DocumentException("Document has no Collection dictionary");
    PdfDictionary folders = collection.getAsDict(FOLDERS);
    if (folders == null)
        throw new DocumentException("Document collection has no folders dictionary");

    collectFolders(result, folders, baseDir);

    return result;
}

static void collectFolders(Map<Integer, File> collection, PdfDictionary folder, File baseDir)
{
    PdfString name = folder.getAsString(PdfName.NAME);
    File folderDir = new File(baseDir, name.toString());
    folderDir.mkdirs();
    PdfNumber id = folder.getAsNumber(PdfName.ID);
    collection.put(id.intValue(), folderDir);

    PdfDictionary next = folder.getAsDict(PdfName.NEXT);
    if (next != null)
        collectFolders(collection, next, baseDir);
    PdfDictionary child = folder.getAsDict(CHILD);
    if (child != null)
        collectFolders(collection, child, folderDir);
}

final static PdfName FOLDERS = new PdfName("Folders");
final static PdfName CHILD = new PdfName("Child");

public static void extractAttachmentsWithFolders(PdfReader reader, String dir) throws IOException, DocumentException
{
    File folder = new File(dir);
    folder.mkdirs();

    Map<Integer, File> folders = retrieveFolders(reader, folder);

    PdfDictionary root = reader.getCatalog();

    PdfDictionary names = root.getAsDict(PdfName.NAMES);
    System.out.println("" + names.getKeys().toString());
    PdfDictionary embedded = names.getAsDict(PdfName.EMBEDDEDFILES);
    System.out.println("" + embedded.toString());

    PdfArray filespecs = embedded.getAsArray(PdfName.NAMES);

    for (int i = 0; i < filespecs.size();)
    {
        extractAttachment(reader, folders, folder, filespecs.getAsString(i++), filespecs.getAsDict(i++));
    }
}

protected static void extractAttachment(PdfReader reader, Map<Integer, File> dirs, File dir, PdfString name, PdfDictionary filespec) throws IOException
{
    PRStream stream;
    FileOutputStream fos;
    String filename;
    PdfDictionary refs = filespec.getAsDict(PdfName.EF);

    File dirHere = dir;
    String nameString = name.toUnicodeString();
    if (nameString.startsWith("<"))
    {
        int closing = nameString.indexOf('>');
        if (closing > 0)
        {
            int folderId = Integer.parseInt(nameString.substring(1, closing));
            File folderFile = dirs.get(folderId);
            if (folderFile != null)
                dirHere = folderFile;
        }
    }

    for (PdfName key : refs.getKeys())
    {
        stream = (PRStream) PdfReader.getPdfObject(refs.getAsIndirectObject(key));

        filename = filespec.getAsString(key).toString();

        fos = new FileOutputStream(new File(dirHere, filename));
        fos.write(PdfReader.getStreamBytes(stream));
        fos.flush();
        fos.close();
    }
}

（摘自）

将这些方法应用于示例PDF（例如，使用中的测试方法

testSamplePortfolio11文件夹

大家好，欢迎来到StackOverflow！请花点时间阅读ho，并提出一个好问题：您是否使用了PDF？您是否能够发现PDF中是如何定义文件夹结构的？正如@Mauker所解释的，你的问题需要做更多的工作。嗨，我没有查看iText RUPS，如果有任何方法或类帮助我从iText RUPS中提取文件夹结构，请分享，即使我在你的推荐后搜索相同的方法或类，但没有得到任何东西，其次，当我搜索PdfName变量时，我只了解结构被定义为PDF中的集合作为键。没有任何变量给出/Folders，即使我尝试了新的PdfName（“Folders”），但每次都没有得到任何东西，而不是空指针异常。@Nitesh正如Mauker和Bruno所暗示的，你的问题需要更多的细节。请分享一个PDF文件样本和你的代码，到目前为止你试图提取文件夹结构。您之前的评论表明，所讨论的公文包不包含文件夹（根据PDF规范的Adobe ExtensionLevel 3），而只是看起来像文件夹的东西，或者您的代码是错误的。因此，PDF和代码都是必需的。你能告诉我你的邮件id吗？这样我就可以共享PDF文件了。此处不可能附加相同的。我会附上屏幕截图相同的pdf与文件夹的帮助嗨，我得到的文件和文件夹分开。首先我获取所有文件夹，然后获取文件夹外的文件。我需要文件夹中的文件。请建议？当检查if（nameString.startsWith）时，我得到的是false（“您是否从我的答案中复制了代码？您是否将其用于示例文件？运行上面链接的单元测试时，我肯定会得到文件夹中的文件。只需使用我在答案中链接的java文件。检查if（name）时，得到的是false

Root
│   ThumbImpression.pdf
│
├───Folder 1
│   │   EStampPdf.pdf
│   │   Presentation.pdf
│   │
│   ├───Folder 11
│   │   │   Test.pdf
│   │   │
│   │   └───Folder 111
│   └───Folder 12
└───Folder 2
        SealDeed.pdf