Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Java中从给定字符串获取子字符串_Java_String_Jsoup - Fatal编程技术网

在Java中从给定字符串获取子字符串

在Java中从给定字符串获取子字符串,java,string,jsoup,Java,String,Jsoup,我从一个网页中读取内容,然后在Jsoup解析器的帮助下对其进行解析,以仅获取正文部分中存在的超链接。我得到的输出为: <a href="/sports/sports.asp" style="TEXT-DECORATION: NONE"><font color="#0000FF">Sports</font></a> <a href="/titanic/titanic.asp" style="TEXT-DECORATION: NONE">&

我从一个网页中读取内容,然后在Jsoup解析器的帮助下对其进行解析,以仅获取正文部分中存在的超链接。我得到的输出为:

<a href="/sports/sports.asp" style="TEXT-DECORATION: NONE"><font color="#0000FF">Sports</font></a>
<a href="/titanic/titanic.asp" style="TEXT-DECORATION: NONE"><font color="#0000FF">Titanic</font></a>
<a href="gastheft.asp" onmouseover="window.status='License Plate Theft';return true" onmouseout="window.status='';return true">license plates</a>
<a href="miracle.asp" onmouseover="window.status='Miracle Cars';return true" onmouseout="window.status='';return true">miracle cars</a>
<a href="/crime/warnings/clear.asp" onmouseover="window.status='Clear Loss';return true" onmouseout="window.status='';return true" target="clear">Clear</a>

and even more hyperlinks.

我如何使用字符串来实现这一点,或者是否有任何其他方式或方法可以使用Jsoup解析器本身来提取此信息?

这应该是使用

String.indexOf 


有了必要的检查。

让我们假设字符串锚包含其中一个链接,那么子字符串的开始索引将在href=之后,结束索引将是索引9之后的第一个引号,如下所示:

String anchor = "<a href=\"/sports/sports.asp\" style=\"TEXT-DECORATION: NONE\"><font color=\"#0000FF\">Sports</font></a>";
int beginIndex = anchor.indexOf("href=\"") + 6; //To start after <a href="
int endIndex = anchor.indexOf("\"", beginIndex);
String desiredPart = anchor.substring(beginIndex, endIndex);
如果锚的形状总是这样的话。。更好的选择是使用正则表达式,最好使用XML解析器。

将此用作参考

import java.util.regex.*;

public class HelloWorld{

     public static void main(String []args){

         String s = "<a href=\"/sports/sports.asp\" style=\"TEXT-DECORATION: NONE\"><font color=\"#0000FF\">Sports</font></a>"+
                    "<a href=\"/titanic/titanic.asp\" style=\"TEXT-DECORATION: NONE\"><font color=\"#0000FF\">Titanic</font></a>"+
                    "<a href=\"gastheft.asp\" onmouseover=\"window.status='License Plate Theft';return true\" onmouseout=\"window.status='';return true\">license plates</a>"+
                    "<a href=\"miracle.asp\" onmouseover=\"window.status='Miracle Cars';return true\" onmouseout=\"window.status='';return true\">miracle cars</a>"+
                    "<a href=\"/crime/warnings/clear.asp\" onmouseover=\"window.status='Clear Loss';return true\" onmouseout=\"window.status='';return true\" target=\"clear\">Clear</a>";
       Pattern p = Pattern.compile("href=\".+?\"");
       Matcher m = p.matcher(s);
       while(m.find())
       {
           System.out.println(m.group().split("=")[1].replace("\"",""));
       }

     }
}
试试这个,可能会有帮助

String html = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();

String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String nextIndex = linkHref .indexOf ("\"", linkHref );

你可以试试这个,它很管用

public class AttributeParsing {

/**
 * @param args
 */
public static void main(String[] args) {
    final String html = "<a href=\"/sports/sports.asp\" style=\"TEXT-DECORATION: NONE\"><font color=\"#0000FF\">Sports</font></a>";

    Document doc = Jsoup.parse(html, "", Parser.xmlParser());
    Element th = doc.select("a[href]").first();

    String href = th.attr("href");

    System.out.println(th);
    System.out.println(href);
}
}

输出:

th:


href:/sports/sports.asp

您可以在一行中完成:

String[] paths = str.replaceAll("(?m)^.*?\"(.*?)\".*?$", "$1").split("(?ms)$.*?^");
第一个方法调用从每行删除除目标之外的所有内容,第二个换行拆分将在所有操作系统终止符上工作


FYI?m打开多行模式,?ms也打开dotall标志。

谢谢,伙计。。这是我一直在寻找的完美答案:
/sports/sports.asp
/titanic/titanic.asp
gastheft.asp
miracle.asp
/crime/warnings/clear.asp
String html = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();

String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String nextIndex = linkHref .indexOf ("\"", linkHref );
public class AttributeParsing {

/**
 * @param args
 */
public static void main(String[] args) {
    final String html = "<a href=\"/sports/sports.asp\" style=\"TEXT-DECORATION: NONE\"><font color=\"#0000FF\">Sports</font></a>";

    Document doc = Jsoup.parse(html, "", Parser.xmlParser());
    Element th = doc.select("a[href]").first();

    String href = th.attr("href");

    System.out.println(th);
    System.out.println(href);
}
String[] paths = str.replaceAll("(?m)^.*?\"(.*?)\".*?$", "$1").split("(?ms)$.*?^");