Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/313.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/372.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 下载整个网页_Java_Javascript_Download_Scroll_Webpage - Fatal编程技术网

Java 下载整个网页

Java 下载整个网页,java,javascript,download,scroll,webpage,Java,Javascript,Download,Scroll,Webpage,使用HTMLEditorKit可以下载整个网页。但是,我需要下载一个需要滚动才能加载其全部内容的整个网页。这项技术通常是通过与Ajax捆绑的JavaScript实现的 Q.:是否有办法欺骗目标网页,仅使用下载其全部内容 Q.2:如果这不仅在Java中是可能的,那么与JavaScript结合使用是否也是可能的 简单的通知,我写的是: 你可以用IDM的抓取器来做 这将有助于: 是的,您可以通过Java代码在本地下载网页。您不能通过Java脚本下载HTMl静态内容。JavaScript并没有像Java

使用
HTMLEditorKit
可以下载整个网页。但是,我需要下载一个需要滚动才能加载其全部内容的整个网页。这项技术通常是通过与Ajax捆绑的JavaScript实现的

Q.:是否有办法欺骗目标网页,仅使用下载其全部内容

Q.2:如果这不仅在Java中是可能的,那么与JavaScript结合使用是否也是可能的

简单的通知,我写的是:


你可以用IDM的抓取器来做

这将有助于:

是的,您可以通过Java代码在本地下载网页。您不能通过Java脚本下载HTMl静态内容。JavaScript并没有像Java提供的那样提供创建文件的功能

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;


public class HttpDownloadUtility {
    private static final int BUFFER_SIZE = 4096;

    /**
     * Downloads a file from a URL
     * @param fileURL HTTP URL of the file to be downloaded
     * @param saveDir path of the directory to save the file
     * @throws IOException
     */
    public static void downloadFile(String fileURL, String saveDir)
            throws IOException {
        URL url = new URL(fileURL);
        HttpURLConnection httpConn = (HttpURLConnection) url.openConnection();
        int responseCode = httpConn.getResponseCode();

        // always check HTTP response code first
        if (responseCode == HttpURLConnection.HTTP_OK) {
            String fileName = "";
            String disposition = httpConn.getHeaderField("Content-Disposition");
            String contentType = httpConn.getContentType();
            int contentLength = httpConn.getContentLength();

            if (disposition != null) {
                // extracts file name from header field
                int index = disposition.indexOf("filename=");
                if (index > 0) {
                    fileName = disposition.substring(index + 10,
                            disposition.length() - 1);
                }
            } else {
                // extracts file name from URL
                fileName = fileURL.substring(fileURL.lastIndexOf("/") + 1,
                        fileURL.length());
            }

            System.out.println("Content-Type = " + contentType);
            System.out.println("Content-Disposition = " + disposition);
            System.out.println("Content-Length = " + contentLength);
            System.out.println("fileName = " + fileName);

            // opens input stream from the HTTP connection
            InputStream inputStream = httpConn.getInputStream();
            String saveFilePath = saveDir + File.separator + fileName;

            // opens an output stream to save into file
            FileOutputStream outputStream = new FileOutputStream(saveFilePath);

            int bytesRead = -1;
            byte[] buffer = new byte[BUFFER_SIZE];
            while ((bytesRead = inputStream.read(buffer)) != -1) {
                outputStream.write(buffer, 0, bytesRead);
            }

            outputStream.close();
            inputStream.close();

            System.out.println("File downloaded");
        } else {
            System.out.println("No file to download. Server replied HTTP code: " + responseCode);
        }
        httpConn.disconnect();
    }
}

您可以使用SeleniumWebDriver java类实现这一点


通常,webdriver用于测试,但它能够模拟用户向下滚动页面,直到页面停止更改,然后您可以使用java代码将内容保存到文件中。

使用HtmlUnit库获取所有文本和图像/css文件

HTMLUnit[link]HTMLUnit.sourceforge.net

1) 要下载文本内容,请使用下面链接上的代码

所有文本内容[链接]

特定标记,如span[link]


2) 要获取图片/文件,请使用下面的[link]

你能举一个这样的网站/页面的例子吗?我对你提出的问题有意义吗?我现在真的很忙,但我会尽快(在7小时内)回到这个主题。在我研究你提出的解决方案之后,你的帮助将得到回报。谢谢你的理解。太好了,成功了。然而,我在9gag.com上测试了它,它没有下载全部内容。如果在9gag上滚动大约30秒,您将到达页面底部。在此之前,有很多图像,它们的结尾.jpg或.gif都不在代码提供的下载文件中。我想你的方式可能是这里唯一暴露的方式。。。如果没有更有效的代码,那么赏金将归你所有。谢谢。有一些软件可以下载整个页面的css、js、图像和字体。但是,如果您使用的是Java程序,那么您只能下载URL中提供的内容(此处仅限HTML代码)。
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;


public class HttpDownloadUtility {
    private static final int BUFFER_SIZE = 4096;

    /**
     * Downloads a file from a URL
     * @param fileURL HTTP URL of the file to be downloaded
     * @param saveDir path of the directory to save the file
     * @throws IOException
     */
    public static void downloadFile(String fileURL, String saveDir)
            throws IOException {
        URL url = new URL(fileURL);
        HttpURLConnection httpConn = (HttpURLConnection) url.openConnection();
        int responseCode = httpConn.getResponseCode();

        // always check HTTP response code first
        if (responseCode == HttpURLConnection.HTTP_OK) {
            String fileName = "";
            String disposition = httpConn.getHeaderField("Content-Disposition");
            String contentType = httpConn.getContentType();
            int contentLength = httpConn.getContentLength();

            if (disposition != null) {
                // extracts file name from header field
                int index = disposition.indexOf("filename=");
                if (index > 0) {
                    fileName = disposition.substring(index + 10,
                            disposition.length() - 1);
                }
            } else {
                // extracts file name from URL
                fileName = fileURL.substring(fileURL.lastIndexOf("/") + 1,
                        fileURL.length());
            }

            System.out.println("Content-Type = " + contentType);
            System.out.println("Content-Disposition = " + disposition);
            System.out.println("Content-Length = " + contentLength);
            System.out.println("fileName = " + fileName);

            // opens input stream from the HTTP connection
            InputStream inputStream = httpConn.getInputStream();
            String saveFilePath = saveDir + File.separator + fileName;

            // opens an output stream to save into file
            FileOutputStream outputStream = new FileOutputStream(saveFilePath);

            int bytesRead = -1;
            byte[] buffer = new byte[BUFFER_SIZE];
            while ((bytesRead = inputStream.read(buffer)) != -1) {
                outputStream.write(buffer, 0, bytesRead);
            }

            outputStream.close();
            inputStream.close();

            System.out.println("File downloaded");
        } else {
            System.out.println("No file to download. Server replied HTTP code: " + responseCode);
        }
        httpConn.disconnect();
    }
}