Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/337.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 使用jsoup进行Html解析,并具有序列化输出_Java_Html Parsing_Jsoup - Fatal编程技术网

Java 使用jsoup进行Html解析,并具有序列化输出

Java 使用jsoup进行Html解析,并具有序列化输出,java,html-parsing,jsoup,Java,Html Parsing,Jsoup,你能帮我解决JSOUP的一个小问题吗, 我的问题是 源html代码: <html> <head></head> <body> <p class="needed-header">Needed Header</p> <p class="needed-sub-header">Needed Sub Header 1</p> <p class="needed-text">Needed Text<

你能帮我解决JSOUP的一个小问题吗, 我的问题是

源html代码:

<html>
<head></head>
<body>
<p class="needed-header">Needed Header</p>
<p class="needed-sub-header">Needed Sub Header 1</p>
<p class="needed-text">Needed Text</p>
<p class="not-needed-text">Not-Needed Text</p>
<p class="needed-sub-header">Needed Sub Header 2</p>
<p class="not-needed-text">Not-Needed Text</p>
<p class="needed-text">Needed Text</p>
</body>
</html>

假设您的类不是真正用
needed…
命名的,您可以使用逗号
创建要查找的元素列表,如

File myHtmlFile = new File("input.txt");
String htmlToParse = new Scanner(myHtmlFile).useDelimiter("\\A").next();

Document doc = Jsoup.parse(htmlToParse);
Element chapterBody = doc.body();

Elements allElements = chapterBody
        .select("p.needed-header, p.needed-sub-header, p.needed-text");
for (Element el : allElements)
    System.out.println(el);
for (Element el : allElements) {
    if (el.className().equals("needed-header")) {
        System.out.println(">>>>" + el.text() + "<<<<");
    } else if (el.className().equals("needed-sub-header")) {
        System.out.println(">>" + el.text() + "<<");
    } else{
        System.out.println(el.text());
    }
}
输出:

<p class="needed-header">Needed Header</p>
<p class="needed-sub-header">Needed Sub Header 1</p>
<p class="needed-text">Needed Text</p>
<p class="needed-sub-header">Needed Sub Header 2</p>
<p class="needed-text">Needed Text</p>
输出:

>>>>Needed Header<<<<
>>Needed Sub Header 1<<
Needed Text
>>Needed Sub Header 2<<
Needed Text

>>所需的headerneed子Header 2不太了解您想要的内容。也许是这个?元素els=doc.select(“p:contains(needed)”@乔治戈博佐夫我怀疑OP对元素的顺序有问题。它可能应该是
Header->subheader->text->AnotherSubHeader->text
,而它是
Header->subheader->subheader->text->text
?元素els=doc.select(“p[class^=needed]”;我第一次犯了一个错误,现在您将得到所有的p标签,该类属性以“needed”开头,原始顺序谢谢,我也将尝试!!
<p class="needed-header">Needed Header</p>
<p class="needed-sub-header">Needed Sub Header 1</p>
<p class="needed-text">Needed Text</p>
<p class="needed-sub-header">Needed Sub Header 2</p>
<p class="needed-text">Needed Text</p>
for (Element el : allElements) {
    if (el.className().equals("needed-header")) {
        System.out.println(">>>>" + el.text() + "<<<<");
    } else if (el.className().equals("needed-sub-header")) {
        System.out.println(">>" + el.text() + "<<");
    } else{
        System.out.println(el.text());
    }
}
>>>>Needed Header<<<<
>>Needed Sub Header 1<<
Needed Text
>>Needed Sub Header 2<<
Needed Text