使用JavaJSOUP解析html页面并存储数据

使用JavaJSOUP解析html页面并存储数据,java,html,parsing,jsoup,html-parsing,Java,Html,Parsing,Jsoup,Html Parsing,我正在尝试使用jsoup库解析html文件,并获取与table class=“scl_list”相关的所有数据,如下所示,这只是html页面的一小部分 <table class="scl_list"> <tr> <th align="center">Id:</th> <th align="center">Name:</th> <th

我正在尝试使用jsoup库解析html文件,并获取与
table class=“scl_list”
相关的所有数据,如下所示,这只是html页面的一小部分

<table class="scl_list">
        <tr>
            <th align="center">Id:</th>
            <th align="center">Name:</th>
            <th align="center">Serial:</th>
            <th align="center">Status:</th>
            <th align="center">Ladestrom:</th>
            <th align="center">Z&auml;hleradresse:</th>
            <th align="center">Z&auml;hlerstand:</th>
        </tr>
        <tr>
            <th align="center">7</th>
            <th align="center">7</th>
            <th align="center">c3001c0020333347156a66</th>
            <th align="center">Idle</th>
            <th align="center">16.0</th>
            <th align="center">40100021</th>
            <th align="center">12464.25</th>
        </tr>
        <tr>
            <th align="center">21</th>
            <th align="center">21</th>
            <th align="center">c3002a003c343551086869</th>
            <th align="center">Idle</th>
            <th align="center">16.0</th>
            <th align="center">540100371</th>
            <th align="center">1219.73</th>
        </tr>
    </table>

Jsoup是一个简单直观的库。您可以在网上找到许多如何读取html表的示例。查看下面的文档,尤其是。回到你的问题,一个简单的方法是:

public static void main(String[] args) {
    String html =   "<table class=\"scl_list\">\n" +
                    "        <tr>\n" +
                    "            <th align=\"center\">Id:</th>\n" +
                    "            <th align=\"center\">Name:</th>\n" +
                    "            <th align=\"center\">Serial:</th>\n" +
                    "            <th align=\"center\">Status:</th>\n" +
                    "            <th align=\"center\">Ladestrom:</th>\n" +
                    "            <th align=\"center\">Z&auml;hleradresse:</th>\n" +
                    "            <th align=\"center\">Z&auml;hlerstand:</th>\n" +
                    "        </tr>\n" +
                    "        <tr>\n" +
                    "            <th align=\"center\">7</th>\n" +
                    "            <th align=\"center\">7</th>\n" +
                    "            <th align=\"center\">c3001c0020333347156a66</th>\n" +
                    "            <th align=\"center\">Idle</th>\n" +
                    "            <th align=\"center\">16.0</th>\n" +
                    "            <th align=\"center\">40100021</th>\n" +
                    "            <th align=\"center\">12464.25</th>\n" +
                    "        </tr>\n" +
                    "        <tr>\n" +
                    "            <th align=\"center\">21</th>\n" +
                    "            <th align=\"center\">21</th>\n" +
                    "            <th align=\"center\">c3002a003c343551086869</th>\n" +
                    "            <th align=\"center\">Idle</th>\n" +
                    "            <th align=\"center\">16.0</th>\n" +
                    "            <th align=\"center\">540100371</th>\n" +
                    "            <th align=\"center\">1219.73</th>\n" +
                    "        </tr>\n" +
                    "    </table>";
    Document doc = Jsoup.parse(html);
    Elements trs = doc.select("table.scl_list tr");
    List<List<String>> data = new ArrayList<>();
    for(Element tr : trs){
        List<String> row = tr.select("th").stream().map(e -> e.text())
                                .collect(Collectors.toList());
        data.add(row);
    }
    data.forEach(System.out::println);
}
因为第一个元素似乎只包含表标题,所以可以使用简单的for循环并从第二个元素开始跳过它

因为我假设您的数据代表电表,所以我建议您实现一个小类作为数据容器,它可以如下所示

class Meter{
    int id;
    String name;
    String serial;
    String status;
    double chargingCurrent;
    String address;
    double  meterReading;

    public Meter(List<String> data) {
        this.id = Integer.parseInt(data.get(0));
        this.name = data.get(1);            
        this.serial = data.get(2);
        this.status = data.get(3);
        this.chargingCurrent = Double.parseDouble(data.get(4));
        this.address = data.get(5);
        this.meterReading = Double.parseDouble(data.get(6));
    }
    // getters & setters
}
[Id:, Name:, Serial:, Status:, Ladestrom:, Zähleradresse:, Zählerstand:]
[7, 7, c3001c0020333347156a66, Idle, 16.0, 40100021, 12464.25]
[21, 21, c3002a003c343551086869, Idle, 16.0, 540100371, 1219.73]
class Meter{
    int id;
    String name;
    String serial;
    String status;
    double chargingCurrent;
    String address;
    double  meterReading;

    public Meter(List<String> data) {
        this.id = Integer.parseInt(data.get(0));
        this.name = data.get(1);            
        this.serial = data.get(2);
        this.status = data.get(3);
        this.chargingCurrent = Double.parseDouble(data.get(4));
        this.address = data.get(5);
        this.meterReading = Double.parseDouble(data.get(6));
    }
    // getters & setters
}
Document doc = Jsoup.parse(html);
Elements trs = doc.select("table.scl_list tr");
List<Meter> meters = new ArrayList<>();
for(int i = 1; i< trs.size(); i++){
    List<String> row = trs.get(i).select("th").stream().map(e -> e.text())
                            .collect(Collectors.toList());
    meters.add(new Meter(row));
} 
meters.forEach(System.out::println);
Meter{id=7, name=7, serial=c3001c0020333347156a66, status=Idle, chargingCurrent=16.0, address=40100021, meterReading=12464.25}
Meter{id=21, name=21, serial=c3002a003c343551086869, status=Idle, chargingCurrent=16.0, address=540100371, meterReading=1219.73}