Java-下载https页面_Java_Https - Fatal编程技术网

Java-下载https页面

java https

Java-下载https页面,java,https,Java,Https,我试图用这段代码下载一个网页的内容，但它与Firefox不一样 URL url = new URL("https://jumpseller.cl/support/webpayplus/"); InputStream is = url.openStream(); Files.copy(is, Paths.get("/tmp/asdfasdf"), StandardCopyOption.REPLACE_EXISTING); 当我选中/tmp/asdfasdf时，它不是页面的html源代码，而是字节

我试图用这段代码下载一个网页的内容，但它与Firefox不一样

URL url = new URL("https://jumpseller.cl/support/webpayplus/");
InputStream is = url.openStream();
Files.copy(is, Paths.get("/tmp/asdfasdf"), StandardCopyOption.REPLACE_EXISTING);

当我选中

/tmp/asdfasdf

时，它不是页面的html源代码，而是字节（没有文本）。但是，在Firefox中，我仍然可以看到该网页及其源代码

如何获取真实的网页？

您需要检查响应标题。页面被压缩了。

内容编码

标题的值为

gzip

试试这个：

URL url = new URL("https://jumpseller.cl/support/webpayplus/");
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();

if ("gzip".equals(conn.getContentEncoding())) {
    is = new GZIPInputStream(is);
}

Files.copy(is, Paths.get("/tmp/asdfasdf"), StandardCopyOption.REPLACE_EXISTING);

使用HtmlUnit库和以下代码：

    try(final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
        java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
        webClient.setAjaxController(new NicelyResynchronizingAjaxController());
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setUseInsecureSSL(true);
        webClient.waitForBackgroundJavaScript(5 * 1000);         
        HtmlPage page = webClient.getPage("https://jumpseller.cl/support/webpayplus/");
        String stringToSave = page.asXml(); // It's a string with full HTML-code, if need you can save it to file.
        webClient.close();  
    }

我在Jumpseller.cl工作。请随时给我们发电子邮件，我们可以向您提供文件的全部内容（考虑到您将向我们提供足够的信用）。