如何在Java中将多个PDF解析为文件夹中的文本_Java_Pdf

如何在Java中将多个PDF解析为文件夹中的文本

java pdf

如何在Java中将多个PDF解析为文件夹中的文本,java,pdf,Java,Pdf,我有一个有很多PDF文件的文件夹，我需要将它们全部转换为txt文件，并将这些文本文件保存在另一个文件夹中。我想用java来实现这一点我有这段代码来解析pdf，但它一次只对一个有效，我需要处理一个包含数千个pdf的文件夹 PDFTextStripper pdfStripper = null; PDDocument pdDoc = null; COSDocument cosDoc = null; File file = new File("C:/my.pdf"); try {

我有一个有很多PDF文件的文件夹，我需要将它们全部转换为txt文件，并将这些文本文件保存在另一个文件夹中。我想用java来实现这一点

我有这段代码来解析pdf，但它一次只对一个有效，我需要处理一个包含数千个pdf的文件夹

 PDFTextStripper pdfStripper = null;
 PDDocument pdDoc = null;
 COSDocument cosDoc = null;
 File file = new File("C:/my.pdf");

 try {
     PDFParser parser = new PDFParser(new FileInputStream(file));
     parser.parse();
     cosDoc = parser.getDocument();
     pdfStripper = new PDFTextStripper();
     pdDoc = new PDDocument(cosDoc);
     pdfStripper.setStartPage(1);
     pdfStripper.setEndPage(20);
     String parsedText = pdfStripper.getText(pdDoc);
    }catch (IOException e) {
     // TODO Auto-generated catch block
     e.printStackTrace();
 }

有什么想法吗？

你可以试试这样的

PDFTextStripper pdfStripper = null;
PDDocument pdDoc = null;
COSDocument cosDoc = null;
String parsedText=""; // append the text to this every time
File folder = new File("/yourFolder"); // put all the pdf files in a folder
File[] listOfFiles = folder.listFiles(); // get all the files as an array

    for (File file : listOfFiles) { // cycle through this array 
        if (file.isFile()) { // for every file
             try { //do the same 
                 PDFParser parser = new PDFParser(new FileInputStream(file));
                 parser.parse();
                 cosDoc = parser.getDocument();
                 pdfStripper = new PDFTextStripper();
                 pdDoc = new PDDocument(cosDoc);
                 pdfStripper.setStartPage(1);
                 pdfStripper.setEndPage(pdDoc.getNumberOfPages()); // if always till the last page
                 parsedText += pdfStripper.getText(pdDoc) +  System.lineSeparator(); // append the text to the String
                }catch (IOException e) {
                 // TODO Auto-generated catch block
                 e.printStackTrace();
               } 
       }
  }

你可以试试这样的

PDFTextStripper pdfStripper = null;
PDDocument pdDoc = null;
COSDocument cosDoc = null;
String parsedText=""; // append the text to this every time
File folder = new File("/yourFolder"); // put all the pdf files in a folder
File[] listOfFiles = folder.listFiles(); // get all the files as an array

    for (File file : listOfFiles) { // cycle through this array 
        if (file.isFile()) { // for every file
             try { //do the same 
                 PDFParser parser = new PDFParser(new FileInputStream(file));
                 parser.parse();
                 cosDoc = parser.getDocument();
                 pdfStripper = new PDFTextStripper();
                 pdDoc = new PDDocument(cosDoc);
                 pdfStripper.setStartPage(1);
                 pdfStripper.setEndPage(pdDoc.getNumberOfPages()); // if always till the last page
                 parsedText += pdfStripper.getText(pdDoc) +  System.lineSeparator(); // append the text to the String
                }catch (IOException e) {
                 // TODO Auto-generated catch block
                 e.printStackTrace();
               } 
       }
  }

将上面的代码放在迭代文件的循环中。尝试使用文件夹名和

listFiles（）

方法而不是一个文件名将上面的代码放在迭代文件的循环中。尝试使用文件夹名和

listFiles（）

方法而不是一个文件名谢谢！！作为跟进，我想知道是否有一种方法可以单独保存新解析的文件，而不是一个大的文本文件。我很高兴我能提供帮助：）您可以在每次循环后将“parsedText”保存到文件文本中，而不是将其附加到文本中谢谢！我会试试的谢谢你！！作为跟进，我想知道是否有一种方法可以单独保存新解析的文件，而不是一个大的文本文件。我很高兴我能提供帮助：）您可以在每次循环后将“parsedText”保存到文件文本中，而不是将其附加到文本中谢谢！我试试看