Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/399.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Jsoup将内容保存到数据库中_Java_Jsoup - Fatal编程技术网

Java Jsoup将内容保存到数据库中

Java Jsoup将内容保存到数据库中,java,jsoup,Java,Jsoup,我有一个url数组,我想从我在数据库中读取的url中存储信息。我的问题是我的数据列表太大,如果从上面读取序列化url,每个url存储在数据库中都需要时间 我知道有一种方法可以使用线程来操作,但我不知道怎么做,请帮助我。不管你用什么方法 try { String lstUrls = "http://www.java2s.com/Tutorials/Java/Scala/index.htm\n" + "http://www.java2s.com/Tutorials/J

我有一个url数组,我想从我在数据库中读取的url中存储信息。我的问题是我的数据列表太大,如果从上面读取序列化url,每个url存储在数据库中都需要时间

我知道有一种方法可以使用线程来操作,但我不知道怎么做,请帮助我。不管你用什么方法

try {
    String lstUrls = "http://www.java2s.com/Tutorials/Java/Scala/index.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0020__Scala_Variables.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0040__Scala_Variable_Declarations.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0060__Scala_Semicolons.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0080__Scala_Code_Blocks.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0090__Scala_Comments.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0100__Scala_Type_Hierarchy.htm\n";
    String[] urls = lstUrls.split("\n");
    for (String url : urls) {
        Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get();
        Elements select = doc.select("div.row");
        String html = select.html();
        System.out.println(html);
        /*
         insert html to database
         */
    }
} catch (IOException ex) {
    ex.printStackTrace();
}

我建议在插入数据库之前压缩数据

//PreparedStatement.setBytes(1,compress(html));

public static byte[] compress(String str) throws Exception {
    if (str == null || str.length() == 0) {
        return null;
    }
    ByteArrayOutputStream obj = new ByteArrayOutputStream();
    GZIPOutputStream gzip = new GZIPOutputStream(obj);
    gzip.write(str.getBytes("UTF-8"));
    gzip.close();
    return obj.toByteArray();
} 

public static String decompress(byte[] bytes) throws Exception {
    GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(bytes));
    BufferedReader bf = new BufferedReader(new InputStreamReader(gis,"UTF-8"));
    String outStr = "";
    String line;
    while ((line = bf.readLine()) != null) {
        outStr += line;
    }
    return outStr;
}
long ts = System.currentTimeMillis();
String filePath = String.valueOf(ts)+".gz"; 
saveToFile(filePath ,html);
--------    
public static void saveToFile(String filePath, String text) {
    try {
        GZIPOutputStream gzos = new GZIPOutputStream(new FileOutputStream(filePath));
        gzos.write(text.getBytes("UTF-8"));
        gzos.finish();
        gzos.close();

    } catch (IOException ex) {
        ex.printStackTrace();
    }
}
第二种方法是,将html数据保存到文件中,并在数据库中只存储文件路径

//PreparedStatement.setBytes(1,compress(html));

public static byte[] compress(String str) throws Exception {
    if (str == null || str.length() == 0) {
        return null;
    }
    ByteArrayOutputStream obj = new ByteArrayOutputStream();
    GZIPOutputStream gzip = new GZIPOutputStream(obj);
    gzip.write(str.getBytes("UTF-8"));
    gzip.close();
    return obj.toByteArray();
} 

public static String decompress(byte[] bytes) throws Exception {
    GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(bytes));
    BufferedReader bf = new BufferedReader(new InputStreamReader(gis,"UTF-8"));
    String outStr = "";
    String line;
    while ((line = bf.readLine()) != null) {
        outStr += line;
    }
    return outStr;
}
long ts = System.currentTimeMillis();
String filePath = String.valueOf(ts)+".gz"; 
saveToFile(filePath ,html);
--------    
public static void saveToFile(String filePath, String text) {
    try {
        GZIPOutputStream gzos = new GZIPOutputStream(new FileOutputStream(filePath));
        gzos.write(text.getBytes("UTF-8"));
        gzos.finish();
        gzos.close();

    } catch (IOException ex) {
        ex.printStackTrace();
    }
}

我建议在插入数据库之前压缩数据

//PreparedStatement.setBytes(1,compress(html));

public static byte[] compress(String str) throws Exception {
    if (str == null || str.length() == 0) {
        return null;
    }
    ByteArrayOutputStream obj = new ByteArrayOutputStream();
    GZIPOutputStream gzip = new GZIPOutputStream(obj);
    gzip.write(str.getBytes("UTF-8"));
    gzip.close();
    return obj.toByteArray();
} 

public static String decompress(byte[] bytes) throws Exception {
    GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(bytes));
    BufferedReader bf = new BufferedReader(new InputStreamReader(gis,"UTF-8"));
    String outStr = "";
    String line;
    while ((line = bf.readLine()) != null) {
        outStr += line;
    }
    return outStr;
}
long ts = System.currentTimeMillis();
String filePath = String.valueOf(ts)+".gz"; 
saveToFile(filePath ,html);
--------    
public static void saveToFile(String filePath, String text) {
    try {
        GZIPOutputStream gzos = new GZIPOutputStream(new FileOutputStream(filePath));
        gzos.write(text.getBytes("UTF-8"));
        gzos.finish();
        gzos.close();

    } catch (IOException ex) {
        ex.printStackTrace();
    }
}
第二种方法是,将html数据保存到文件中,并在数据库中只存储文件路径

//PreparedStatement.setBytes(1,compress(html));

public static byte[] compress(String str) throws Exception {
    if (str == null || str.length() == 0) {
        return null;
    }
    ByteArrayOutputStream obj = new ByteArrayOutputStream();
    GZIPOutputStream gzip = new GZIPOutputStream(obj);
    gzip.write(str.getBytes("UTF-8"));
    gzip.close();
    return obj.toByteArray();
} 

public static String decompress(byte[] bytes) throws Exception {
    GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(bytes));
    BufferedReader bf = new BufferedReader(new InputStreamReader(gis,"UTF-8"));
    String outStr = "";
    String line;
    while ((line = bf.readLine()) != null) {
        outStr += line;
    }
    return outStr;
}
long ts = System.currentTimeMillis();
String filePath = String.valueOf(ts)+".gz"; 
saveToFile(filePath ,html);
--------    
public static void saveToFile(String filePath, String text) {
    try {
        GZIPOutputStream gzos = new GZIPOutputStream(new FileOutputStream(filePath));
        gzos.write(text.getBytes("UTF-8"));
        gzos.finish();
        gzos.close();

    } catch (IOException ex) {
        ex.printStackTrace();
    }
}

要使用多个线程检索数据,可以执行以下操作:

    Executor ex = Executors.newFixedThreadPool(3);
    String lstUrls = "http://www.java2s.com/Tutorials/Java/Scala/index.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0020__Scala_Variables.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0040__Scala_Variable_Declarations.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0060__Scala_Semicolons.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0080__Scala_Code_Blocks.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0090__Scala_Comments.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0100__Scala_Type_Hierarchy.htm\n";
    String[] urls = lstUrls.split("\n");
    for (final String url : urls) {
        try {
            ex.execute(new Runnable() {
                @Override
                public void run() {
                    try {
                        Document doc = Jsoup
                                .connect(url)
                                .userAgent(
                                        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36")
                                .get();
                        Elements select = doc.select("div.row");
                        String html = select.html();
                        System.out.println(html);
                        /*
                         * insert html to database
                         */
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }
            });
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
这将使用3个线程并发处理URL,如果您想使用3个以上的线程,请更改此行
Executor ex=Executors.newFixedThreadPool(3)并用所需的任何数字替换
3


您可以了解有关

的更多信息要使用多线程检索数据,您可以执行以下操作:

    Executor ex = Executors.newFixedThreadPool(3);
    String lstUrls = "http://www.java2s.com/Tutorials/Java/Scala/index.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0020__Scala_Variables.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0040__Scala_Variable_Declarations.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0060__Scala_Semicolons.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0080__Scala_Code_Blocks.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0090__Scala_Comments.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0100__Scala_Type_Hierarchy.htm\n";
    String[] urls = lstUrls.split("\n");
    for (final String url : urls) {
        try {
            ex.execute(new Runnable() {
                @Override
                public void run() {
                    try {
                        Document doc = Jsoup
                                .connect(url)
                                .userAgent(
                                        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36")
                                .get();
                        Elements select = doc.select("div.row");
                        String html = select.html();
                        System.out.println(html);
                        /*
                         * insert html to database
                         */
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }
            });
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
这将使用3个线程并发处理URL,如果您想使用3个以上的线程,请更改此行
Executor ex=Executors.newFixedThreadPool(3)并用所需的任何数字替换
3


你可以找到更多关于

你可以做的事情是把输出排队并把它作为一个批次插入数据库,这样你只会打一次数据库。@如果我的答案对你有帮助,请考虑投票。谢谢Turink完成。对不起,英语不是我的语言,所以我不明白你说什么,你应该希望有更详细的说明或者一个我需要学习的文档。这当然是:D谢谢HasanagaOne,你可以做的就是把输出排队并把它作为一个批次插入到数据库中,这样你只会打一次数据库。英语不是我的语言,所以我不明白你说的话,你应该希望有更详细的说明或一份我需要学习的文档。那当然是:D谢谢你HasanagaI我认为如果将列表url拆分为多个线程,实现会更快。我看你也对阅读网页感兴趣,因此,试着用meI思考这个主题,如果将列表url拆分为多个线程,实现速度会更快。我看到你也对阅读网页感兴趣,所以试着用meIt思考这个主题很好,我认为要做得更复杂。我将了解更多关于你的方法。非常感谢您:当我添加common时,在for(最终字符串url:url){}内完成了以下通知,该语句是第一次运行通知。我想在完成工作后,循环通知可以输出吗?这很好,我想做得更复杂。我会了解更多关于你的方法。非常感谢您:当我添加common时,在for(最终字符串url:url){}内完成了以下通知,该语句是第一次运行通知。我刚做完工作循环通知可以输出吗?