Java 使用Jsoup显示HTML标记_Java_Html_Tags_Jsoup

Java 使用Jsoup显示HTML标记

java html tags

Java 使用Jsoup显示HTML标记,java,html,tags,jsoup,Java,Html,Tags,Jsoup,使用Jsoup可以很容易地计算特定标记在文本中出现的次数。例如，我试图查看锚定标记在给定文本中出现了多少次 String content = "An <a href='http://example.com/'>example</a> link.. An <a href='http://example.com/'>example</b&g

使用Jsoup可以很容易地计算特定标记在文本中出现的次数。例如，我试图查看锚定标记在给定文本中出现了多少次

    String content = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>. <p>An <a href='http://example.com/'><b>example</b></a> link.</p>. <p>An <a href='http://example.com/'><b>example</b></a> link.</p>. <p>An <a href='http://example.com/'><b>example</b></a> link.</p>";
    Document doc = Jsoup.parse(content);
    Elements links = doc.select("a[href]"); // a with href
    System.out.println(links.size());

String content=“一个链接。
一个链接。
一个链接。
一个链接。”；
文档doc=Jsoup.parse（内容）；
Elements links=doc.select（“a[href]”；//a带href
System.out.println（links.size（））；

这让我数到4。如果我有一个句子，我想知道这个句子是否包含任何html标记，那么使用Jsoup可以吗？谢谢。

使用正则表达式可能会更好，但是如果您真的想使用JSoup，那么您可以尝试匹配所有元素，然后减去4，因为JSoup会自动添加四个元素，首先是根元素，然后是

，

和

元素

这可能大致如下所示：

// attempt to count html elements in string - incorrect code, see below 
public static int countHtmlElements(String content) {
    Document doc = Jsoup.parse(content);
    Elements elements = doc.select("*");
    return elements.size()-4;
}

但是，如果文本包含

、

或

，则这会给出错误的结果；比较以下结果：

// gives a correct count of 2 html elements System.out.println(countHtmlElements("some text with markup")); // incorrectly counts 0 elements, as the body is subtracted System.out.println(countHtmlElements("<body>this gives a wrong result</body>"));

//给出2个html元素的正确计数 System.out.println（countHtmlElements（“带有标记的一些文本”）； //减去主体后，错误地计算0个元素 System.out.println（countHtmlements（“这给出了错误的结果”）；
因此，要使这项工作，你必须单独检查“魔术”标签；这就是为什么我觉得正则表达式可能更简单
更多失败的尝试使这项工作成功：使用
parseBodyFragment
而不是
parse
没有帮助，因为JSoup会以同样的方式对其进行清理。相同，按
doc.select（“body*”）计算
省去了减去4的麻烦，但如果涉及到
，它仍然会产生错误的计数。只有当您的应用程序确保要检查的字符串中不存在
、
或
元素时，它才能在该限制下工作。
谢谢。doc.select（“*”）对我有效，因为我的HTML不包含您提到的标记。但是，是的，我意识到正则表达式更适合解决这个问题。