Java 使用iText读取文件夹中的多个PDF文件_Java_Pdf_Itext

Java 使用iText读取文件夹中的多个PDF文件

java pdf itext

Java 使用iText读取文件夹中的多个PDF文件,java,pdf,itext,Java,Pdf,Itext,我在java中使用iText读取PDF文件时遇到问题。我知道如何阅读PDF文件中的页面现在，我想从一个文件夹中读取多个PDF文件。我怎样才能完成这项任务 import java.io.FileOutputStream; import java.io.IOException; import java.io.PrintWriter; import part1.chapter01.HelloWorld; import com.itextpdf.text.Document; import com.

我在java中使用iText读取PDF文件时遇到问题。我知道如何阅读PDF文件中的页面

现在，我想从一个文件夹中读取多个PDF文件。

我怎样才能完成这项任务

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintWriter;

import part1.chapter01.HelloWorld;

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.io.RandomAccessSourceFactory;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PRTokeniser;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfDictionary;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfTemplate;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
import com.itextpdf.text.pdf.parser.ContentByteUtils;
import com.itextpdf.text.pdf.parser.PdfContentStreamProcessor;
import com.itextpdf.text.pdf.parser.RenderListener;

public class ParsingHelloWorld {

    /** The resulting PDF. */
    public static final String PDF = "results/part4/chapter15/hello_reverse.pdf";
    /** A possible resulting after parsing the PDF. */
    public static final String TEXT1 = "results/part4/chapter15/result1.txt";
    /** A possible resulting after parsing the PDF. */
    public static final String TEXT2 = "results/part4/chapter15/result2.txt";
    /** A possible resulting after parsing the PDF. */
    public static final String TEXT3 = "results/part4/chapter15/result3.txt";

    /**
     * Generates a PDF file with the text 'Hello World'
     * @throws DocumentException 
     * @throws IOException 
     */
    public void createPdf(String filename) throws DocumentException, IOException {
        // step 1
        Document document = new Document();
        // step 2
        PdfWriter writer
          = PdfWriter.getInstance(document, new FileOutputStream(filename));
        // step 3
        document.open();
        // step 4
        // we add the text to the direct content, but not in the right order
        PdfContentByte cb = writer.getDirectContent();
        BaseFont bf = BaseFont.createFont();
        cb.beginText();
        cb.setFontAndSize(bf, 12);
        cb.moveText(88.66f, 367); 
        cb.showText("ld");
        cb.moveText(-22f, 0); 
        cb.showText("Wor");
        cb.moveText(-15.33f, 0); 
        cb.showText("llo");
        cb.moveText(-15.33f, 0); 
        cb.showText("He");
        cb.endText();
        // we also add text in a form XObject
        PdfTemplate tmp = cb.createTemplate(250, 25);
        tmp.beginText();
        tmp.setFontAndSize(bf, 12);
        tmp.moveText(0, 7);
        tmp.showText("Hello People");
        tmp.endText();
        cb.addTemplate(tmp, 36, 343);
        // step 5
        document.close();
    }

    /**
     * Parses the PDF using PRTokeniser
     * @param src  the path to the original PDF file
     * @param dest the path to the resulting text file
     * @throws IOException
     */
    public void parsePdf(String src, String dest) throws IOException {
        PdfReader reader = new PdfReader(src);
        // we can inspect the syntax of the imported page
        byte[] streamBytes = reader.getPageContent(1);
        PRTokeniser tokenizer = new PRTokeniser(new RandomAccessFileOrArray(new RandomAccessSourceFactory().createSource(streamBytes)));
        PrintWriter out = new PrintWriter(new FileOutputStream(dest));
        while (tokenizer.nextToken()) {
            if (tokenizer.getTokenType() == PRTokeniser.TokenType.STRING) {
                out.println(tokenizer.getStringValue());
            }
        }
        out.flush();
        out.close();
        reader.close();
    }

    /**
     * Extracts text from a PDF document.
     * @param src  the original PDF document
     * @param dest the resulting text file
     * @throws IOException
     */
    public void extractText(String src, String dest) throws IOException {
        PrintWriter out = new PrintWriter(new FileOutputStream(dest));
        PdfReader reader = new PdfReader(src);
        RenderListener listener = new MyTextRenderListener(out);
        PdfContentStreamProcessor processor = new PdfContentStreamProcessor(listener);
        PdfDictionary pageDic = reader.getPageN(1);
        PdfDictionary resourcesDic = pageDic.getAsDict(PdfName.RESOURCES);
        processor.processContent(ContentByteUtils.getContentBytesForPage(reader, 1), resourcesDic);
        out.flush();
        out.close();
        reader.close();
    }

    /**
     * Main method.
     * @param    args    no arguments needed
     * @throws DocumentException 
     * @throws IOException
     */
    public static void main(String[] args) throws DocumentException, IOException {
        ParsingHelloWorld example = new ParsingHelloWorld();
        HelloWorld.main(args);
        example.createPdf(PDF);
        example.parsePdf(HelloWorld.RESULT, TEXT1);
        example.parsePdf(PDF, TEXT2);
        example.extractText(PDF, TEXT3);
    }
}

取自

iTextPDF

网站。这将向您展示如何阅读单张PDF。如果要从文件夹中读取多个PDF，则需要一个

DirectoryStream

DirectoryStream<Path> allPDF = Files.newDirectoryStream(Paths.get("/path/to/folder"),"*.pdf");

DirectoryStream allPDF=Files.newDirectoryStream（path.get（“/path/to/folder”），“*.pdf”）；

现在，

DirectoryStream

只包含那些与文件夹中的PDF文件相对应的

Path

对象

迭代

目录流

，获取

路径

对象的绝对路径，然后执行任何您想要的操作。

您尝试了什么？请提供您的代码。也是你的例外。我想从一个文件夹中读取多个PDF文件。-什么意思？是否要同时打开多个pdfreader实例？有什么问题？