Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/323.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Java中提取嵌套的html标记?_Java_Html - Fatal编程技术网

在Java中提取嵌套的html标记?

在Java中提取嵌套的html标记?,java,html,Java,Html,我有以下HTML片段: String source = "<p>dsdds</p>" + "<ul class=\"some-class-name\">" + "<li>data</li>" + "<li><div><ul><li>data</li></ul></d

我有以下HTML片段:

String source = "<p>dsdds</p>"
                + "<ul class=\"some-class-name\">"
                + "<li>data</li>"
                + "<li><div><ul><li>data</li></ul></div></li>"
                + "</ul>"
                + "<p>data</p>"
                + "<ul>data</ul><div>data</div>";
String source=“dsdds

” +“
    ” +“
  • 数据
  • ” +“
    • 数据” +“
    ” +“数据

    ” +“
      数据
    数据”;
我想要达到的结果是:

<ul class="some-class-name">
    <li>data</li>
    <li><div><ul><li>data</li></ul></div></li>
</ul>
  • 资料
    • 数据
到目前为止,我所尝试的:

        String endTag = "</ul>";
        int origin = source.indexOf("<ul class=\"some-class-name\">");
        int currentFrom = origin;
        int to = source.indexOf(endTag, currentFrom);
        while (true) {
            int curIndex = source.indexOf("<ul", currentFrom + 1);
            if (curIndex > -1) {
                currentFrom = curIndex;
                to = source.indexOf(endTag, currentFrom);
            } else {
                to = source.indexOf(endTag, to);
                break;
            }
        }
        System.out.println(source.substring(origin, to + endTag.length()));
String endTag=“”;
int origin=source.indexOf(“
    ””; int currentFrom=原点; int-to=source.indexOf(endTag,currentFrom); while(true){
    int curIndex=source.indexOf(“幸运的是,您的片段是有效的XHTML,这意味着它是有效的XML

    XPath专门用于从XML中提取节点:

    // Must have a single root in order to parse.
    String input = "<div>" + source + "</div>";
    
    XPath xpath = XPathFactory.newInstance().newXPath();
    Node node = (Node)
        xpath.evaluate("//ul[@class='some-class-name']",
            new InputSource(new StringReader(input)),
            XPathConstants.NODE);
    
    StringWriter result = new StringWriter();
    Transformer transformer =
        TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    transformer.transform(new DOMSource(node), new StreamResult(result));
    
    String fragment = result.toString();
    
    //必须有一个根才能进行分析。
    字符串输入=“源+”;
    XPath=XPathFactory.newInstance().newXPath();
    节点=(节点)
    evaluate(“//ul[@class='some-class-name']”,
    新建InputSource(新建StringReader(输入)),
    XPathConstants.NODE);
    StringWriter结果=新建StringWriter();
    变压器=
    TransformerFactory.newInstance().newTransformer();
    setOutputProperty(OutputKeys.OMIT_XML_声明,“yes”);
    transform(新的DOMSource(节点)、新的StreamResult(结果));
    字符串片段=result.toString();
    
    您应该这样使用

    Document doc = Jsoup.parse(source);
    Element e = doc.select("ul.some-class-name").first();
    System.out.println(e);
    
    结果:

    <ul class="some-class-name">
     <li>data</li>
     <li>
      <div>
       <ul>
        <li>data</li>
       </ul>
      </div></li>
    </ul>
    
    • 资料
      • 资料

    不要重新发明循环,使用Html解析器l,如jsoup
    <ul class="some-class-name">
     <li>data</li>
     <li>
      <div>
       <ul>
        <li>data</li>
       </ul>
      </div></li>
    </ul>