Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/337.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 如何通过文本内容获取HTMLDOM路径?_Java_Html - Fatal编程技术网

Java 如何通过文本内容获取HTMLDOM路径?

Java 如何通过文本内容获取HTMLDOM路径?,java,html,Java,Html,HTML文件: <html> <body> <div class="main"> <p id="tID">content</p> </div> </body> </html> chrome开发者工具有这个功能(Elements标签,底部栏),我想知道如何在java中实现它 谢谢你的帮助:)玩得开心:) JAVA代码 import

HTML文件:

<html>
    <body>
        <div class="main">
            <p id="tID">content</p>
        </div>
    </body>
</html>
chrome开发者工具有这个功能(Elements标签,底部栏),我想知道如何在java中实现它

谢谢你的帮助:)

玩得开心:)

JAVA代码

import java.io.File;

import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;

import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.DomSerializer;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;



public class Teste {

    public static void main(String[] args) {
        try {
            // read and clean document
            TagNode tagNode = new HtmlCleaner().clean(new File("test.xml"));
            Document document = new DomSerializer(new CleanerProperties()).createDOM(tagNode);

            // use XPath to find target node
            XPath xpath = XPathFactory.newInstance().newXPath();
            Node node = (Node) xpath.evaluate("//*[text()='content']", document, XPathConstants.NODE);

            // assembles jquery/css selector
            String result = "";
            while (node != null && node.getParentNode() != null) {
                result = readPath(node) + " " + result;
                node = node.getParentNode();
            }
            System.out.println(result);
            // returns html body div#myDiv.foo.bar p#tID 

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    // Gets id and class attributes of this node
    private static String readPath(Node node) {
        NamedNodeMap attributes = node.getAttributes();
        String id = readAttribute(attributes.getNamedItem("id"), "#");
        String clazz = readAttribute(attributes.getNamedItem("class"), ".");
        return node.getNodeName() + id + clazz;
    }

    // Read attribute
    private static String readAttribute(Node node, String token) {
        String result = "";
        if(node != null) {
            result = token + node.getTextContent().replace(" ", token);
        }
        return result;
    }

}
<html>
    <body>
        <br>
        <div id="myDiv" class="foo bar">
            <p id="tID">content</p>
        </div>
    </body>
</html>

XML示例

import java.io.File;

import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;

import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.DomSerializer;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;



public class Teste {

    public static void main(String[] args) {
        try {
            // read and clean document
            TagNode tagNode = new HtmlCleaner().clean(new File("test.xml"));
            Document document = new DomSerializer(new CleanerProperties()).createDOM(tagNode);

            // use XPath to find target node
            XPath xpath = XPathFactory.newInstance().newXPath();
            Node node = (Node) xpath.evaluate("//*[text()='content']", document, XPathConstants.NODE);

            // assembles jquery/css selector
            String result = "";
            while (node != null && node.getParentNode() != null) {
                result = readPath(node) + " " + result;
                node = node.getParentNode();
            }
            System.out.println(result);
            // returns html body div#myDiv.foo.bar p#tID 

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    // Gets id and class attributes of this node
    private static String readPath(Node node) {
        NamedNodeMap attributes = node.getAttributes();
        String id = readAttribute(attributes.getNamedItem("id"), "#");
        String clazz = readAttribute(attributes.getNamedItem("class"), ".");
        return node.getNodeName() + id + clazz;
    }

    // Read attribute
    private static String readAttribute(Node node, String token) {
        String result = "";
        if(node != null) {
            result = token + node.getTextContent().replace(" ", token);
        }
        return result;
    }

}
<html>
    <body>
        <br>
        <div id="myDiv" class="foo bar">
            <p id="tID">content</p>
        </div>
    </body>
</html>


内容


解释

  • 对象
    文档
    指向已评估的XML
  • XPath
    /*[text()='content']
    查找text='content'的所有内容,并查找节点
  • while
    循环到第一个节点,获取当前元素的id和类

  • 更多解释

  • 在我使用的这个新解决方案中。例如,您可以使用

    ,cleaner将替换为

  • 要使用HtmlCleaner,只需下载最新的jar

  • 你是说java,还是javascript?但它不是XML文档,如果有

    或其他标记没有结束标记,它将无法解析。
    org.XML.sax.SAXParseException:元素类型“br”必须由匹配的结束标记“
    ”终止。
    我编辑了我的答案以处理格式不正确的XML。看一看。