Java 使用jsoup进行Html解析,并具有序列化输出
你能帮我解决JSOUP的一个小问题吗, 我的问题是 源html代码:Java 使用jsoup进行Html解析,并具有序列化输出,java,html-parsing,jsoup,Java,Html Parsing,Jsoup,你能帮我解决JSOUP的一个小问题吗, 我的问题是 源html代码: <html> <head></head> <body> <p class="needed-header">Needed Header</p> <p class="needed-sub-header">Needed Sub Header 1</p> <p class="needed-text">Needed Text<
<html>
<head></head>
<body>
<p class="needed-header">Needed Header</p>
<p class="needed-sub-header">Needed Sub Header 1</p>
<p class="needed-text">Needed Text</p>
<p class="not-needed-text">Not-Needed Text</p>
<p class="needed-sub-header">Needed Sub Header 2</p>
<p class="not-needed-text">Not-Needed Text</p>
<p class="needed-text">Needed Text</p>
</body>
</html>
假设您的类不是真正用
needed…
命名的,您可以使用逗号,
创建要查找的元素列表,如
File myHtmlFile = new File("input.txt");
String htmlToParse = new Scanner(myHtmlFile).useDelimiter("\\A").next();
Document doc = Jsoup.parse(htmlToParse);
Element chapterBody = doc.body();
Elements allElements = chapterBody
.select("p.needed-header, p.needed-sub-header, p.needed-text");
for (Element el : allElements)
System.out.println(el);
for (Element el : allElements) {
if (el.className().equals("needed-header")) {
System.out.println(">>>>" + el.text() + "<<<<");
} else if (el.className().equals("needed-sub-header")) {
System.out.println(">>" + el.text() + "<<");
} else{
System.out.println(el.text());
}
}
输出:
<p class="needed-header">Needed Header</p>
<p class="needed-sub-header">Needed Sub Header 1</p>
<p class="needed-text">Needed Text</p>
<p class="needed-sub-header">Needed Sub Header 2</p>
<p class="needed-text">Needed Text</p>
输出:
>>>>Needed Header<<<<
>>Needed Sub Header 1<<
Needed Text
>>Needed Sub Header 2<<
Needed Text
>>所需的headerneed子Header 2不太了解您想要的内容。也许是这个?元素els=doc.select(“p:contains(needed)”@乔治戈博佐夫我怀疑OP对元素的顺序有问题。它可能应该是Header->subheader->text->AnotherSubHeader->text
,而它是Header->subheader->subheader->text->text
?元素els=doc.select(“p[class^=needed]”;我第一次犯了一个错误,现在您将得到所有的p标签,该类属性以“needed”开头,原始顺序谢谢,我也将尝试!!
<p class="needed-header">Needed Header</p>
<p class="needed-sub-header">Needed Sub Header 1</p>
<p class="needed-text">Needed Text</p>
<p class="needed-sub-header">Needed Sub Header 2</p>
<p class="needed-text">Needed Text</p>
for (Element el : allElements) {
if (el.className().equals("needed-header")) {
System.out.println(">>>>" + el.text() + "<<<<");
} else if (el.className().equals("needed-sub-header")) {
System.out.println(">>" + el.text() + "<<");
} else{
System.out.println(el.text());
}
}
>>>>Needed Header<<<<
>>Needed Sub Header 1<<
Needed Text
>>Needed Sub Header 2<<
Needed Text