Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/326.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用java和jsoup从页面源获取数据_Java_Html_Css_Jsoup - Fatal编程技术网

如何使用java和jsoup从页面源获取数据

如何使用java和jsoup从页面源获取数据,java,html,css,jsoup,Java,Html,Css,Jsoup,如何从中获取23000000美元和47351251美元的值 下页来源? 我只想从源代码中获取这些值,但我不确定最好的方法 <div class="txt-block"> <h4 class="inline">Budget:</h4>$23,000,000 <span class="attribute">(estimated)</span> </div>

如何从中获取23000000美元和47351251美元的值 下页来源? 我只想从源代码中获取这些值,但我不确定最好的方法

  <div class="txt-block">
            <h4 class="inline">Budget:</h4>$23,000,000
            <span class="attribute">(estimated)</span>
        </div>

        <div class="txt-block">
            <h4 class="inline">Opening Weekend USA:</h4> $260,382,
<span class="attribute">20 December 2013</span>, <span class="attribute">Limited Release</span>
        </div>

        <div class="txt-block">
<h4 class="inline">Gross USA:</h4> $25,568,251
        </div>
        <div class="txt-block">
<h4 class="inline">Cumulative Worldwide Gross:</h4> $47,351,251
        </div>
它正在工作,但有更好的解决方案吗?

使用
ownText()
而不是子字符串,只循环一次,而不是两次。试试这个:

    String url = "https://www.imdb.com/title/tt1798709";
    Connection connection = Jsoup.connect(url);
    Document document = connection.get();
    Elements elements = document.select("div.txt-block");

    String gross = "";
    String budget = "";

    final String budgetRegex = "Budget:";
    final String grossRegex = "Cumulative Worldwide Gross:";

    for (Element e : elements) {
        final String h4Text = e.getElementsByTag("h4").first().text();
        switch (h4Text) {
            case budgetRegex:
                budget = e.ownText();
                break;
            case grossRegex:
                gross = e.ownText();
                break;
        }
        if (!gross.isEmpty() && !budget.isEmpty()) { //this IF is optional, just added for performance
            break;
        }
    }
    System.out.println(gross + ", " + budget);

您可以使用jsoup伪选择器执行以下操作:

    Document document = Jsoup.parse(html);
    String budget = document.select("div:contains(Budget:)").first().ownText();
    String gross = document.select("div:contains(Cumulative Worldwide Gross:)").first().ownText();
    System.out.println(gross + ", " + budget);

关于伪选择器的更多信息,您可以在这里找到:

如果美元金额包含在DOM元素中,那么检索它们几乎是微不足道的。“它正在工作,但如何做得更好呢?”在这种情况下,您的问题更适合我投票结束这个问题,因为它属于Stack Exchange网络中的另一个站点,即
    Document document = Jsoup.parse(html);
    String budget = document.select("div:contains(Budget:)").first().ownText();
    String gross = document.select("div:contains(Cumulative Worldwide Gross:)").first().ownText();
    System.out.println(gross + ", " + budget);