Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/user-interface/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 使用jsoup提取html代码-两个相邻的span标记_Java_Jsoup - Fatal编程技术网

Java 使用jsoup提取html代码-两个相邻的span标记

Java 使用jsoup提取html代码-两个相邻的span标记,java,jsoup,Java,Jsoup,我的html代码如下所示: <div class="cloud recommended"> <div id="bigcloud" class="eventwithfoto"> <h1>Artist Name</h1> <div id="eventphoto"> <a href="http://linkToPhoto.jpg" target="_top" rel="li

我的html代码如下所示:

<div class="cloud recommended">
    <div id="bigcloud" class="eventwithfoto">
        <h1>Artist Name</h1>
        <div id="eventphoto">
            <a href="http://linkToPhoto.jpg" target="_top" rel="lightbox"><img src="http://linkToPhoto.jpg" height="150"></a>
        </div>

        <div id="eventmain" style="margin-top: 12px;">
            <p id="eventwhere"><span><b>Name of place<br></b></span><span>Address of place</span>
            <br> tel.: +48 111 222 111 <br><a href="http://www.linktoplace.com" target="_blank">http://www.linktoplace.com</a> </p>
            <p id="eventdate">2017-04-20 godz. 20:00</p>


            <div id="eventadmission">
                120 zł
            </div>

        </div>

        <div class="clear"></div>
        <div id="eventdesc">
            Here is some descr<br/>Some other descr 
            <div class="clear"></div>

            <br>
                <a href="http://link.com" target="_blank">link to event</a>
        </div>
    </div>
</div>
但每个字符串都是空的。我做错了什么?
如何解析这种特定的html格式来获取这些变量?

编写了一个小示例来演示:

package sandbox.jsoup;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class JsoupMain {
    private final String HTML = "<div class=\"cloud recommended\">\n" +
            "    <div id=\"bigcloud\" class=\"eventwithfoto\">\n" +
            "        <h1>Artist Name</h1>\n" +
            "        <div id=\"eventphoto\">\n" +
            "            <a href=\"http://linkToPhoto.jpg\" target=\"_top\" rel=\"lightbox\"><img src=\"http://linkToPhoto.jpg\" height=\"150\"></a>\n" +
            "        </div>\n" +
            "\n" +
            "        <div id=\"eventmain\" style=\"margin-top: 12px;\">\n" +
            "            <p id=\"eventwhere\"><span><b>Name of place<br></b></span><span>Address of place</span>\n" +
            "            <br> tel.: +48 111 222 111 <br><a href=\"http://www.linktoplace.com\" target=\"_blank\">http://www.linktoplace.com</a> </p>\n" +
            "            <p id=\"eventdate\">2017-04-20 godz. 20:00</p>\n" +
            "\n" +
            "\n" +
            "            <div id=\"eventadmission\">\n" +
            "                120 zł\n" +
            "            </div>\n" +
            "\n" +
            "        </div>\n" +
            "\n" +
            "        <div class=\"clear\"></div>\n" +
            "        <div id=\"eventdesc\">\n" +
            "            Here is some descr<br/>Some other descr \n" +
            "            <div class=\"clear\"></div>\n" +
            "\n" +
            "            <br>\n" +
            "                <a href=\"http://link.com\" target=\"_blank\">link to event</a>\n" +
            "        </div>\n" +
            "    </div>\n" +
            "</div>";

    public static void main(String[] args) {
        new JsoupMain().findTwoSpans();
    }

    private void findTwoSpans() {
        Document doc = Jsoup.parse(HTML);
        Element eventWhere = doc.getElementById("eventwhere");
        Elements spans = eventWhere.select("span");
        System.out.println("span[0]="+spans.get(0).text());
        Element spanTwo = spans.get(1);
        System.out.println("span[1]="+spanTwo.text());

        // Get phone
        Element eventMain = doc.getElementById("eventmain");
        String textMain = eventMain.after(spanTwo).after("br").text();

        int beginPos = textMain.indexOf("tel.: ");
        int endPos = textMain.indexOf(" http://");
        if (beginPos>0 && endPos>0) {
            String phone = textMain.substring(beginPos+6, endPos);
            System.out.println("Found phone: "+phone);
        }
        else {
            System.out.println("Phone not found: "+textMain);
        }
    }
}

尝试使用nextSiblingElement()-找到第一个跨度后,迭代直到找到下一个跨度。@AlexC谢谢你的提示!目前,我只能看到
eventName
,您能帮助我并告诉我显示其他元素时我做错了什么吗?CSS3选择器可以选择第n个子元素,可能类似于:“#eventwhere>span:nth child(2)>b”我必须测试这个,但您可以通过CSS选择器进行测试,或者您可以找到doc.getElementsByTag(“#eventwhere”)。选择(“span”).getnextSiblingElement()(如果不存在,则检查空值等),因此我不确定我做错了什么,因为现在甚至
doc.getElementsByTag(“#eventwhere”).select(“span”).text()
这将返回空字符串:(谢谢男人:)还有一个问题-你能看看我原来的html代码,告诉我如何提取手机号码吗?更新了手机的“低技术”提取代码。
package sandbox.jsoup;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class JsoupMain {
    private final String HTML = "<div class=\"cloud recommended\">\n" +
            "    <div id=\"bigcloud\" class=\"eventwithfoto\">\n" +
            "        <h1>Artist Name</h1>\n" +
            "        <div id=\"eventphoto\">\n" +
            "            <a href=\"http://linkToPhoto.jpg\" target=\"_top\" rel=\"lightbox\"><img src=\"http://linkToPhoto.jpg\" height=\"150\"></a>\n" +
            "        </div>\n" +
            "\n" +
            "        <div id=\"eventmain\" style=\"margin-top: 12px;\">\n" +
            "            <p id=\"eventwhere\"><span><b>Name of place<br></b></span><span>Address of place</span>\n" +
            "            <br> tel.: +48 111 222 111 <br><a href=\"http://www.linktoplace.com\" target=\"_blank\">http://www.linktoplace.com</a> </p>\n" +
            "            <p id=\"eventdate\">2017-04-20 godz. 20:00</p>\n" +
            "\n" +
            "\n" +
            "            <div id=\"eventadmission\">\n" +
            "                120 zł\n" +
            "            </div>\n" +
            "\n" +
            "        </div>\n" +
            "\n" +
            "        <div class=\"clear\"></div>\n" +
            "        <div id=\"eventdesc\">\n" +
            "            Here is some descr<br/>Some other descr \n" +
            "            <div class=\"clear\"></div>\n" +
            "\n" +
            "            <br>\n" +
            "                <a href=\"http://link.com\" target=\"_blank\">link to event</a>\n" +
            "        </div>\n" +
            "    </div>\n" +
            "</div>";

    public static void main(String[] args) {
        new JsoupMain().findTwoSpans();
    }

    private void findTwoSpans() {
        Document doc = Jsoup.parse(HTML);
        Element eventWhere = doc.getElementById("eventwhere");
        Elements spans = eventWhere.select("span");
        System.out.println("span[0]="+spans.get(0).text());
        Element spanTwo = spans.get(1);
        System.out.println("span[1]="+spanTwo.text());

        // Get phone
        Element eventMain = doc.getElementById("eventmain");
        String textMain = eventMain.after(spanTwo).after("br").text();

        int beginPos = textMain.indexOf("tel.: ");
        int endPos = textMain.indexOf(" http://");
        if (beginPos>0 && endPos>0) {
            String phone = textMain.substring(beginPos+6, endPos);
            System.out.println("Found phone: "+phone);
        }
        else {
            System.out.println("Phone not found: "+textMain);
        }
    }
}
span[0]=Name of place
span[1]=Address of place
Found phone: +48 111 222 111