Java 如何将扫描的url插入url表？_Java_Jsoup

Java 如何将扫描的url插入url表？

java

Java 如何将扫描的url插入url表？,java,jsoup,Java,Jsoup,我正在写一个网络爬虫，我遇到了一些麻烦。以下是我尝试执行的一些伪代码： for every url in url-list { urlid = NextURLID; Insert urlid and url to their respective columns URL table NextURLID++; } 以下是我到目前为止的情况： void startCrawl() { int NextURLID = 0; int NextU

我正在写一个网络爬虫，我遇到了一些麻烦。以下是我尝试执行的一些伪代码：

for every url in url-list {
    urlid = NextURLID;

    Insert urlid and url to their respective columns URL table

    NextURLID++;
}

以下是我到目前为止的情况：

void startCrawl() {
        int NextURLID = 0;
        int NextURLIDScanned = 0;

        try
        {
            openConnection(); //Open the database
        }
        catch (SQLException | IOException e)
        {
            e.printStackTrace();
        }

        String url = "http://jsoup.org";
        print("Fetching %s...", url);

        Document doc = Jsoup.connect(url).get();
        Elements links = doc.getElementsByTag("a");

        for (Element link : links) {
            urlID = NextURLID;

            //Code to insert (urlid, url)  to the URL table

            NextURLID++;

        }
}

正如您所看到的，我还没有将url插入表中的代码。我想应该是这样的：

stat.executeUpdate("INSERT INTO urls VALUES ('"+urlID+"','"+url+"')");

但是如何在每次循环迭代中用新的url覆盖urlID和url变量呢？

谢谢

在您的情况下，准备声明更合适：

String insertURL = "INSERT INTO urls(urlID, url) VALUES (?, ?)";
PreparedStatement ps = dbConnection.prepareStatement(insertURL);

for (Element link : links) {
     ps.setInt(1, NextURLID);
     ps.setInt(2, link.attr("abs:href"));  
     ps.executeUpdate();

     NextURLID++;
}

// ...

是否尝试多次更新同一行？还是为每个链接创建新行？对于要插入的值，可以只使用NextURLID而不是urlID（甚至没有初始化）。我不明白您所说的“如何在每次循环迭代中用新的url覆盖urlID和url变量？”是什么意思，您到底想覆盖什么？数据库中的行？局部变量？抱歉不够清楚！我试图查看一个url，然后从该url获取所有链接并将它们放入一个表中。插入到表格中的每个新链接都应该位于新行。我正在搜索另一个答案，但你的答案对我来说似乎很有用