Java 我的正则表达式在检测字符串中的URL时出现问题？_Java_Regex_Url

Java 我的正则表达式在检测字符串中的URL时出现问题？

java regex url

Java 我的正则表达式在检测字符串中的URL时出现问题？,java,regex,url,Java,Regex,Url,大家好。我使用以下正则表达式来检测字符串中的URL 并将它们包装在标记中 public static String detectUrls(String text) { String newText = text .replaceAll("(?:https?|ftps?|http?)://[\\w/%.-?&=]+", "<a href='$0'>$0</a>").r

大家好。我使用以下正则表达式来检测字符串中的URL 并将它们包装在标记中

public static String detectUrls(String text) {

        String newText = text
                .replaceAll("(?:https?|ftps?|http?)://[\\w/%.-?&=]+",
                        "<a href='$0'>$0</a>").replaceAll(
                        "(www\\.)[\\w/%.-?&=]+", "<a href='http://$0'>$0</a>");
        return newText;
    }

publicstaticstringdetecturls（字符串文本）{
字符串newText=text
.replaceAll（“（？：https？| ftps？| http？）://[\\w/%.-？&=]+”，
“）。全部替换(
“（www\\）[\\w/%.-？&=]+，”；
返回新文本；
}

我有一个问题，即未正确检测到以下链接：我对正则表达式不太在行，请指教

www.liferay.com/web/raymond.auge/blog/

（www.opensocial.org/）

我正在使用这个：

private static final String URL_REGEX = 
   "http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&amp;\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?";

Matcher matcher = URL_PATTERN.matcher(text);
text = matcher.replaceAll("<a href=\"$0\">$0</a>");
return text;

private静态最终字符串URL\u REGEX=
“http（s）？：/（[\\w+？\.\\w+]）+（[a-zA-Z0-9\\\~\！\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\；
Matcher Matcher=URL\u PATTERN.Matcher（文本）；
text=matcher.replaceAll（“”）；
返回文本；

您遇到的问题是，您在字符组（

[]

）中使用

，而没有对其进行转义，该转义用于定义范围

-？

（即字符

/0123456789:；？

）。要么将其转义为

\\-

，要么将其放在字符类的末尾，这样它就不会完成一个范围

public static String detectUrls(String text) {
    String newText = text
            .replaceAll("(?:https?|ftps?|http?)://[\\w/%.\\-?&=]+",
                    "<a href='$0'>$0</a>").replaceAll(
                    "(www\\.)[\\w/%.\\-?&=]+", "<a href='http://$0'>$0</a>");
    return newText;
}

publicstaticstringdetecturls（字符串文本）{
字符串newText=text
.replaceAll（“（？：https？| ftps？| http？）://[\\w/%.\\-？&=]+”，
“）。全部替换(
“（www\\）[\\w/%.\\-？&=]+，”；
返回新文本；
}

正如马可所说，你应该避开

，为了匹配你最后给出的两个例子，你必须选择

http

。而且

http？

与

htt

匹配，这不是一个正确的协议

因此，正则表达式将是：

"(?:(?:https?|ftps?)://)?[\\w/%.?&=-]+"

签出声明

和而不是&
就足够了，因为a
、m
和p
已经在a-z
和范围内
被删除了两次。这种模式在大多数情况下都能正常工作，但没有捕捉到这种情况：（www.opensocial.org）@marcog:实际上有一种模式仍然没有捕捉到：类似http://www.google的东西。com@swordhttp:
后面的空格是打字错误吗？@marcog，是的，我想添加它，因为没有编辑器将其转换为的空间，所以我添加它是为了跳过编辑器的格式设置，你知道我想说什么吗？@marcog，你有什么建议？@Swarm交换replaceAll（）
调用并使用。它的工作原理是：有一个小问题——它还在显示的URL前面添加了http://
。这是正则表达式的一个限制，要解决它，您必须一次性解析文本，而不使用正则表达式。