Java selenium webdriver从pdf提取数据时出现解密错误_Java_Pdf_Encryption_Selenium Webdriver_Pdfbox

Java selenium webdriver从pdf提取数据时出现解密错误

java pdf encryption selenium-webdriver

Java selenium webdriver从pdf提取数据时出现解密错误,java,pdf,encryption,selenium-webdriver,pdfbox,Java,Pdf,Encryption,Selenium Webdriver,Pdfbox,我试图从某个包含pdf文件的url中提取文本，但我遇到了如下错误-信息：文档已加密 2015年5月27日上午9:27:50 org.apache.pdfbox.filter.FlateFilter解码 public void getTextFromPdf(String urlS) throws IOException { driver.get(urlS); driver.manage().timeouts().implicitlyWait(10, TimeUnit

我试图从某个包含pdf文件的url中提取文本，但我遇到了如下错误-信息：文档已加密 2015年5月27日上午9:27:50 org.apache.pdfbox.filter.FlateFilter解码

public void getTextFromPdf(String urlS) throws IOException {
        driver.get(urlS);
        driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
        URL url = new URL(driver.getCurrentUrl());
        BufferedInputStream fileToParse = new BufferedInputStream(url.openStream());

        //parse()  --  This will parse the stream and populate the COSDocument object. 
        //COSDocument object --  This is the in-memory representation of the PDF document
        PDFParser parser = new PDFParser(fileToParse);
        parser.parse();

        //getPDDocument() -- This will get the PD document that was parsed. When you are done with this document you must call    close() on it to release resources
        //PDFTextStripper() -- This class will take a pdf document and strip out all of the text and ignore the formatting and           such.
        System.out.println(urlS);
        String output = new PDFTextStripper().getText(parser.getPDDocument());
        System.out.println(output);
        parser.getPDDocument().close();
        driver.manage().timeouts().implicitlyWait(100, TimeUnit.SECONDS);

请将此代码用于PDFBox工作：

        PDDocument doc = PDDocument.loadNonSeq(fileToParse, null);
        String output = new PDFTextStripper().getText(doc);
        doc.close();

关于依赖项，或者使用您可以找到的pdfbox app jar文件。

您是否已将所需的BouncyCastle jar添加到类路径中？不，我没有在我的类路径中使用BouncyCastle jar，事实上我对pdfbox和selenium webdriver是新手，据我所知，pdfbox需要BouncyCastle安全提供程序进行加密和签名。查阅所需的确切版本取决于您的PDFBox版本。使用此代码时我会遇到相同的错误，请给出一些完美的答案solution@AbhaySingh可能是文件是用非空密码加密的，还是您使用了load（）而不是loadNonSeq（）？