Java 下载整个网页_Java_Javascript_Download_Scroll_Webpage

Java 下载整个网页

java javascript download scroll

Java 下载整个网页,java,javascript,download,scroll,webpage,Java,Javascript,Download,Scroll,Webpage,使用HTMLEditorKit可以下载整个网页。但是，我需要下载一个需要滚动才能加载其全部内容的整个网页。这项技术通常是通过与Ajax捆绑的JavaScript实现的 Q.：是否有办法欺骗目标网页，仅使用下载其全部内容 Q.2:如果这不仅在Java中是可能的，那么与JavaScript结合使用是否也是可能的简单的通知，我写的是：你可以用IDM的抓取器来做这将有助于：是的，您可以通过Java代码在本地下载网页。您不能通过Java脚本下载HTMl静态内容。JavaScript并没有像Java

使用

HTMLEditorKit

可以下载整个网页。但是，我需要下载一个需要滚动才能加载其全部内容的整个网页。这项技术通常是通过与Ajax捆绑的JavaScript实现的
Q.：是否有办法欺骗目标网页，仅使用下载其全部内容
Q.2:如果这不仅在Java中是可能的，那么与JavaScript结合使用是否也是可能的
简单的通知，我写的是：

你可以用IDM的抓取器来做
这将有助于：
是的，您可以通过Java代码在本地下载网页。您不能通过Java脚本下载HTMl静态内容。JavaScript并没有像Java提供的那样提供创建文件的功能

import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.net.HttpURLConnection; import java.net.URL; public class HttpDownloadUtility { private static final int BUFFER_SIZE = 4096; /** * Downloads a file from a URL * @param fileURL HTTP URL of the file to be downloaded * @param saveDir path of the directory to save the file * @throws IOException */ public static void downloadFile(String fileURL, String saveDir) throws IOException { URL url = new URL(fileURL); HttpURLConnection httpConn = (HttpURLConnection) url.openConnection(); int responseCode = httpConn.getResponseCode(); // always check HTTP response code first if (responseCode == HttpURLConnection.HTTP_OK) { String fileName = ""; String disposition = httpConn.getHeaderField("Content-Disposition"); String contentType = httpConn.getContentType(); int contentLength = httpConn.getContentLength(); if (disposition != null) { // extracts file name from header field int index = disposition.indexOf("filename="); if (index > 0) { fileName = disposition.substring(index + 10, disposition.length() - 1); } } else { // extracts file name from URL fileName = fileURL.substring(fileURL.lastIndexOf("/") + 1, fileURL.length()); } System.out.println("Content-Type = " + contentType); System.out.println("Content-Disposition = " + disposition); System.out.println("Content-Length = " + contentLength); System.out.println("fileName = " + fileName); // opens input stream from the HTTP connection InputStream inputStream = httpConn.getInputStream(); String saveFilePath = saveDir + File.separator + fileName; // opens an output stream to save into file FileOutputStream outputStream = new FileOutputStream(saveFilePath); int bytesRead = -1; byte[] buffer = new byte[BUFFER_SIZE]; while ((bytesRead = inputStream.read(buffer)) != -1) { outputStream.write(buffer, 0, bytesRead); } outputStream.close(); inputStream.close(); System.out.println("File downloaded"); } else { System.out.println("No file to download. Server replied HTTP code: " + responseCode); } httpConn.disconnect(); } }

您可以使用SeleniumWebDriver java类实现这一点

通常，webdriver用于测试，但它能够模拟用户向下滚动页面，直到页面停止更改，然后您可以使用java代码将内容保存到文件中。
使用HtmlUnit库获取所有文本和图像/css文件
HTMLUnit[link]HTMLUnit.sourceforge.net
1）要下载文本内容，请使用下面链接上的代码
所有文本内容[链接]
特定标记，如span[link]

2）要获取图片/文件，请使用下面的[link]
你能举一个这样的网站/页面的例子吗？我对你提出的问题有意义吗？我现在真的很忙，但我会尽快（在7小时内）回到这个主题。在我研究你提出的解决方案之后，你的帮助将得到回报。谢谢你的理解。太好了，成功了。然而，我在9gag.com上测试了它，它没有下载全部内容。如果在9gag上滚动大约30秒，您将到达页面底部。在此之前，有很多图像，它们的结尾.jpg或.gif都不在代码提供的下载文件中。我想你的方式可能是这里唯一暴露的方式。。。如果没有更有效的代码，那么赏金将归你所有。谢谢。有一些软件可以下载整个页面的css、js、图像和字体。但是，如果您使用的是Java程序，那么您只能下载URL中提供的内容（此处仅限HTML代码）。
import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.net.HttpURLConnection; import java.net.URL; public class HttpDownloadUtility { private static final int BUFFER_SIZE = 4096; /** * Downloads a file from a URL * @param fileURL HTTP URL of the file to be downloaded * @param saveDir path of the directory to save the file * @throws IOException */ public static void downloadFile(String fileURL, String saveDir) throws IOException { URL url = new URL(fileURL); HttpURLConnection httpConn = (HttpURLConnection) url.openConnection(); int responseCode = httpConn.getResponseCode(); // always check HTTP response code first if (responseCode == HttpURLConnection.HTTP_OK) { String fileName = ""; String disposition = httpConn.getHeaderField("Content-Disposition"); String contentType = httpConn.getContentType(); int contentLength = httpConn.getContentLength(); if (disposition != null) { // extracts file name from header field int index = disposition.indexOf("filename="); if (index > 0) { fileName = disposition.substring(index + 10, disposition.length() - 1); } } else { // extracts file name from URL fileName = fileURL.substring(fileURL.lastIndexOf("/") + 1, fileURL.length()); } System.out.println("Content-Type = " + contentType); System.out.println("Content-Disposition = " + disposition); System.out.println("Content-Length = " + contentLength); System.out.println("fileName = " + fileName); // opens input stream from the HTTP connection InputStream inputStream = httpConn.getInputStream(); String saveFilePath = saveDir + File.separator + fileName; // opens an output stream to save into file FileOutputStream outputStream = new FileOutputStream(saveFilePath); int bytesRead = -1; byte[] buffer = new byte[BUFFER_SIZE]; while ((bytesRead = inputStream.read(buffer)) != -1) { outputStream.write(buffer, 0, bytesRead); } outputStream.close(); inputStream.close(); System.out.println("File downloaded"); } else { System.out.println("No file to download. Server replied HTTP code: " + responseCode); } httpConn.disconnect(); } }