Java 如何检测字符串中是否存在URL_Java_Url

Java 如何检测字符串中是否存在URL

java url

Java 如何检测字符串中是否存在URL,java,url,Java,Url,我有一个输入字符串说请转到http://stackoverflow.com。许多浏览器/IDE/应用程序会检测到字符串的url部分，并自动添加锚定。因此它变成了，请转到我需要使用Java做同样的事情。您可以这样做（调整正则表达式以满足您的需要）： String originalString=“请转到http://www.stackoverflow.com"; 字符串newString=originalString.replaceAll（“http://.+？（com | net | org）/

我有一个输入字符串说

请转到http://stackoverflow.com

。许多浏览器/IDE/应用程序会检测到字符串的url部分，并自动添加锚定

。因此它变成了

，请转到
我需要使用Java做同样的事情。
您可以这样做（调整正则表达式以满足您的需要）：
String originalString=“请转到http://www.stackoverflow.com";
字符串newString=originalString.replaceAll（“http://.+？（com | net | org）/{0,1}，”）；
原语：
String msg = "Please go to http://stackoverflow.com";
String withURL = msg.replaceAll("(?:https?|ftps?)://[\\w/%.-]+", "<a href='$0'>$0</a>");
System.out.println(withURL);

String msg=“请转到http://stackoverflow.com";
URL=msg.replaceAll的字符串（（？：https？| ftps？）://[\\w/%.-]+”，“”）；
System.out.println（带URL）；

这需要改进，以匹配正确的URL，尤其是获取参数（？foo=bar&x=25）
您要问两个不同的问题
用字符串标识URL的最佳方法是什么？
看
如何用Java编写上述解决方案？其他说明String.replaceAll
用法的响应已经解决了这一问题
虽然它不是Java专用的，但Jeff Atwood最近发表了一篇文章，介绍了在尝试查找和匹配任意文本中的URL时可能遇到的陷阱：

它提供了一个好的正则表达式，可以与正确（或多或少）处理paren所需的代码片段一起使用
正则表达式：
\(?\bhttp://[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]

paren清理：
if (s.StartsWith("(") && s.EndsWith(")"))
{
    return s.Substring(1, s.Length - 2);
}

请使用java.net.URL！！
嘿，为什么不为这个“java.net.URL”使用java中的核心类并让它验证URL呢
虽然下面的代码违反了黄金原则“仅在异常情况下使用异常”，但我认为尝试重新设计java平台上非常成熟的程序是没有意义的
代码如下：
import java.net.URL;
import java.net.MalformedURLException;

// Replaces URLs with html hrefs codes
public class URLInString {
    public static void main(String[] args) {
        String s = args[0];
        // separate input by spaces ( URLs don't have spaces )
        String [] parts = s.split("\\s+");

        // Attempt to convert each item into an URL.   
        for( String item : parts ) try {
            URL url = new URL(item);
            // If possible then replace with anchor...
            System.out.print("<a href=\"" + url + "\">"+ url + "</a> " );    
        } catch (MalformedURLException e) {
            // If there was an URL that was not it!...
            System.out.print( item + " " );
        }

        System.out.println();
    }
}

生成以下输出：
Please go to <a href="http://stackoverflow.com">http://stackoverflow.com</a> and then <a href="mailto:oscarreyes@wordpress.com">mailto:oscarreyes@wordpress.com</a> to download a file from    <a href="ftp://user:pass@someserver/someFile.txt">ftp://user:pass@someserver/someFile.txt</a>

或者其他属性：spec、port、file、query、ref等

处理所有协议（至少是java平台知道的所有协议），另外一个好处是，如果有任何URL是java目前无法识别的，并且最终被合并到URL类中（通过库更新），您将透明地获得它 对PhiLho答案的一个很好的改进是：
msg.replaceAll（（（？：https？| ftps？）://[\w/%.-][/\？？\w=？\w？/%.-]？[/\？&\w=？\w？/%.-]*，“$0”）
以下代码对“阿特伍德方法”进行了这些修改：
检测http之外的https（添加其他方案很简单）
由于HtTpS://有效，因此使用CASE_insistive标志
匹配的括号集将被剥离（它们可以嵌套到
任何级别）。此外，任何剩余的不匹配左括号都是
已剥离，但后面的右括号保持不变（以表示尊重）
维基百科风格的URL）
URL是链接文本中的HTML编码
目标属性通过方法参数传入。可以根据需要添加其他属性
在匹配URL之前，它不使用\b标识分词。URL可以以左括号或http[s]：//开头，无需其他要求
注:

下面的代码中使用了ApacheCommonsLang的StringUtils
下面对HtmlUtil.encode（）的调用是一个最终调用
一些Tomahawk HTML代码对链接文本进行编码，但任何类似的实用程序都可以
请参阅方法注释，了解在JSF或其他输出默认为HTML编码的环境中的用法

这是根据我们客户的要求编写的，我们认为它代表了RFC中允许的字符和常用字符之间的合理折衷。这里提供它是希望它对其他人有用
可以进行进一步扩展，以允许输入任何Unicode字符（即不使用%XX（两位十六进制）和超链接转义，但这需要接受所有Unicode字母加上有限的标点符号，然后在“可接受”分隔符（例如，%，|，#等）上拆分，URL对每个部分进行编码，然后再粘合在一起。例如，/Björn_Andrésen（堆栈溢出生成器未检测到）将为“http://en.wikipedia.org/wiki/Bj%C3%B6rn_Andr%C3%A9sen“在href中，但将在页面上的链接文本中包含Björn_Andrésen
// NOTES:   1) \w includes 0-9, a-z, A-Z, _
//          2) The leading '-' is the '-' character. It must go first in character class expression
private static final String VALID_CHARS = "-\\w+&@#/%=~()|";
private static final String VALID_NON_TERMINAL = "?!:,.;";

// Notes on the expression:
//  1) Any number of leading '(' (left parenthesis) accepted.  Will be dealt with.  
//  2) s? ==> the s is optional so either [http, https] accepted as scheme
//  3) All valid chars accepted and then one or more
//  4) Case insensitive so that the scheme can be hTtPs (for example) if desired
private static final Pattern URI_FINDER_PATTERN = Pattern.compile("\\(*https?://["+ VALID_CHARS + VALID_NON_TERMINAL + "]*[" +VALID_CHARS + "]", Pattern.CASE_INSENSITIVE );

/**
 * <p>
 * Finds all "URL"s in the given _rawText, wraps them in 
 * HTML link tags and returns the result (with the rest of the text
 * html encoded).
 * </p>
 * <p>
 * We employ the procedure described at:
 * http://www.codinghorror.com/blog/2008/10/the-problem-with-urls.html
 * which is a <b>must-read</b>.
 * </p>
 * Basically, we allow any number of left parenthesis (which will get stripped away)
 * followed by http:// or https://.  Then any number of permitted URL characters
 * (based on http://www.ietf.org/rfc/rfc1738.txt) followed by a single character
 * of that set (basically, those minus typical punctuation).  We remove all sets of 
 * matching left & right parentheses which surround the URL.
 *</p>
 * <p>
 * This method *must* be called from a tag/component which will NOT
 * end up escaping the output.  For example:
 * <PRE>
 * <h:outputText ... escape="false" value="#{core:hyperlinkText(textThatMayHaveURLs, '_blank')}"/>
 * </pre>
 * </p>
 * <p>
 * Reason: we are adding <code>&lt;a href="..."&gt;</code> tags to the output *and*
 * encoding the rest of the string.  So, encoding the outupt will result in
 * double-encoding data which was already encoded - and encoding the <code>a href</code>
 * (which will render it useless).
 * </p>
 * <p>
 * 
 * @param   _rawText  - if <code>null</code>, returns <code>""</code> (empty string).
 * @param   _target   - if not <code>null</code> or <code>""</code>, adds a target attributed to the generated link, using _target as the attribute value.
 */
public static final String hyperlinkText( final String _rawText, final String _target ) {

    String returnValue = null;

    if ( !StringUtils.isBlank( _rawText ) ) {

        final Matcher matcher = URI_FINDER_PATTERN.matcher( _rawText );

        if ( matcher.find() ) {

            final int originalLength    =   _rawText.length();

            final String targetText = ( StringUtils.isBlank( _target ) ) ? "" :  " target=\"" + _target.trim() + "\"";
            final int targetLength      =   targetText.length();

            // Counted 15 characters aside from the target + 2 of the URL (max if the whole string is URL)
            // Rough guess, but should keep us from expanding the Builder too many times.
            final StringBuilder returnBuffer = new StringBuilder( originalLength * 2 + targetLength + 15 );

            int currentStart;
            int currentEnd;
            int lastEnd     = 0;

            String currentURL;

            do {
                currentStart = matcher.start();
                currentEnd = matcher.end();
                currentURL = matcher.group();

                // Adjust for URLs wrapped in ()'s ... move start/end markers
                //      and substring the _rawText for new URL value.
                while ( currentURL.startsWith( "(" ) && currentURL.endsWith( ")" ) ) {
                    currentStart = currentStart + 1;
                    currentEnd = currentEnd - 1;

                    currentURL = _rawText.substring( currentStart, currentEnd );
                }

                while ( currentURL.startsWith( "(" ) ) {
                    currentStart = currentStart + 1;

                    currentURL = _rawText.substring( currentStart, currentEnd );
                }

                // Text since last match
                returnBuffer.append( HtmlUtil.encode( _rawText.substring( lastEnd, currentStart ) ) );

                // Wrap matched URL
                returnBuffer.append( "<a href=\"" + currentURL + "\"" + targetText + ">" + currentURL + "</a>" );

                lastEnd = currentEnd;

            } while ( matcher.find() );

            if ( lastEnd < originalLength ) {
                returnBuffer.append( HtmlUtil.encode( _rawText.substring( lastEnd ) ) );
            }

            returnValue = returnBuffer.toString();
        }
    } 

    if ( returnValue == null ) {
        returnValue = HtmlUtil.encode( _rawText );
    }

    return returnValue;

}

要检测URL，您只需要以下内容：
if (yourtextview.getText().toString().contains("www") || yourtextview.getText().toString().contains("http://"){ your code here if contains URL;}

我编写了自己的URI/URL提取器，并认为有人可能会觉得它很有用，因为它比其他答案更好，因为：

它基于流，可用于大型文档
它可以通过一个策略链来处理各种各样的问题

由于post的代码有点长（尽管只有一个Java文件），所以我将其放在上面
下面是调用它的一个主要方法的签名，以显示它是如何实现上述要点的：
public static Iterator<ExtractedURI> extractURIs(
    final Reader reader,
    final Iterable<ToURIStrategy> strategies,
    String ... schemes);

公共静态迭代器提取器URI(
最终读者，
最后的战略，
串…方案）；

有一个默认的策略链来处理大多数阿特伍德问题
public static List<ToURIStrategy> DEFAULT_STRATEGY_CHAIN = ImmutableList.of(
    new RemoveSurroundsWithToURIStrategy("'"),
    new RemoveSurroundsWithToURIStrategy("\""),
    new RemoveSurroundsWithToURIStrategy("(", ")"),
    new RemoveEndsWithToURIStrategy("."),
    DEFAULT_STRATEGY,
    REMOVE_LAST_STRATEGY);

public static List DEFAULT\u STRATEGY\u CHAIN=ImmutableList.of(
新搬迁周边旅游策略（“”），
新移除的周边旅游策略（“\”），
新搬迁周边地区旅游策略（“（“，”），
新的旅游策略（“.”），
默认策略，
删除(最后)(策略);；

享受吧！
我制作了一个小图书馆，它正好做到了这一点：

一些棘手的示例及其检测到的链接：

http://example.com.
→ .
http://example.com，
→ ,
(http://example.com)
→ （）
（…（参见http://example.com))
→ （……（见））
https://en.wikipedia.org/wiki/Link_（塞尔达的传奇）→

http://üñîøðé.com/
→ 
建议一种更方便的方法
if (yourtextview.getText().toString().contains("www") || yourtextview.getText().toString().contains("http://"){ your code here if contains URL;}

public static Iterator<ExtractedURI> extractURIs(
    final Reader reader,
    final Iterable<ToURIStrategy> strategies,
    String ... schemes);

public static List<ToURIStrategy> DEFAULT_STRATEGY_CHAIN = ImmutableList.of(
    new RemoveSurroundsWithToURIStrategy("'"),
    new RemoveSurroundsWithToURIStrategy("\""),
    new RemoveSurroundsWithToURIStrategy("(", ")"),
    new RemoveEndsWithToURIStrategy("."),
    DEFAULT_STRATEGY,
    REMOVE_LAST_STRATEGY);

<TextView
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:autoLink="web"
    android:linksClickable="true"/>