Java Jsoup解析html字符串

Java Jsoup解析html字符串,java,parsing,jsoup,Java,Parsing,Jsoup,我有一个元素: <td id="color" align="center"> Z 29.02-23.05 someText, <br> some.Text2 <a href="man.php?id=111">J. Smith</a> (l.)&nbsp; </td> 我只得到J.史密斯。。不幸的是,我不知道如何解析像这样的标记,据我所知,您只能在两个标记之间接收文本

我有一个
元素

<td id="color" align="center">
Z 29.02-23.05 someText,
<br> 
some.Text2 <a href="man.php?id=111">J. Smith</a> (l.)&nbsp;
</td>

我只得到J.史密斯。。不幸的是,我不知道如何解析像

这样的标记,据我所知,您只能在两个标记之间接收文本,这在文档中使用单个

标记是不可能的

我能想到的唯一选择是使用
split()
来接收第二部分:

String partAfterBr = element.text().split("<br>")[1];
Document relevantPart = JSoup.parse(partAfterBr);
// do whatever you want with the Document in order to receive the necessary parts
String partAfterBr=element.text().split(“
”)[1]; Document relevantPart=JSoup.parse(partAfterBr); //对文档执行任何操作,以便接收必要的部分
可以拯救你的生命:

package com.github.davidepastore.stackoverflow35436825;

import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.TextNode;

/**
 * Stackoverflow 35436825
 *
 */
public class App 
{
    public static void main( String[] args )
    {
        String html = "<html><body><table><tr><td id=\"color\" align=\"center\">" +
                        "Z 29.02-23.05 someText," +
                        "<br>" +
                        "some.Text2 <a href=\"man.php?id=111\">J. Smith</a> (l.)&nbsp;" +
                        "</td></tr></table></body></html>";
        Document doc = Jsoup.parse( html );
        Element td = doc.getElementById( "color" );
        String text = getText( td );
        System.out.println("Text: " + text);
    }

    /**
     * Get the custom text from the given {@link Element}.
     * @param element The {@link Element} from which get the custom text.
     * @return Returns the custom text.
     */
    private static String getText(Element element) {
        String working = "";
        List<Node> childNodes = element.childNodes();
        boolean brFound = false;
        for (int i = 0; i < childNodes.size(); i++) {
            Node child = childNodes.get( i );
             if (child instanceof TextNode) {
                 if(brFound){
                     working += ((TextNode) child).text();
                 }
             }
             if (child instanceof Element) {
                 Element childElement = (Element)child;
                 if(brFound){
                     working += childElement.text();
                 }
                 if(childElement.tagName().equals( "br" )){
                     brFound = true;
                 }
             }
        }
        return working;
    }
}

@拉奥,谢谢。我不知道xpath的相关内容。我去找找information@LordAnomander确切地我忘了拆分。。。这也很好。感谢让我们安排一下,好吗!!非常感谢你!将更详细地研究
节点
,这很酷
package com.github.davidepastore.stackoverflow35436825;

import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.TextNode;

/**
 * Stackoverflow 35436825
 *
 */
public class App 
{
    public static void main( String[] args )
    {
        String html = "<html><body><table><tr><td id=\"color\" align=\"center\">" +
                        "Z 29.02-23.05 someText," +
                        "<br>" +
                        "some.Text2 <a href=\"man.php?id=111\">J. Smith</a> (l.)&nbsp;" +
                        "</td></tr></table></body></html>";
        Document doc = Jsoup.parse( html );
        Element td = doc.getElementById( "color" );
        String text = getText( td );
        System.out.println("Text: " + text);
    }

    /**
     * Get the custom text from the given {@link Element}.
     * @param element The {@link Element} from which get the custom text.
     * @return Returns the custom text.
     */
    private static String getText(Element element) {
        String working = "";
        List<Node> childNodes = element.childNodes();
        boolean brFound = false;
        for (int i = 0; i < childNodes.size(); i++) {
            Node child = childNodes.get( i );
             if (child instanceof TextNode) {
                 if(brFound){
                     working += ((TextNode) child).text();
                 }
             }
             if (child instanceof Element) {
                 Element childElement = (Element)child;
                 if(brFound){
                     working += childElement.text();
                 }
                 if(childElement.tagName().equals( "br" )){
                     brFound = true;
                 }
             }
        }
        return working;
    }
}
Text: some.Text2 J. Smith (l.)