Ruby on rails 正则表达式-查找推文中的所有链接_Ruby On Rails_Regex

Ruby on rails 正则表达式-查找推文中的所有链接

ruby-on-rails regex

Ruby on rails 正则表达式-查找推文中的所有链接,ruby-on-rails,regex,Ruby On Rails,Regex,我的正则表达式很差，让我失望，所以这里有一些帮助会很好我只想返回tweet中出现的所有链接（只是一个字符串）-一些示例如下： “很好的总结http://mytest.com/blog/post.html （#测试） ”http://mytest.com/blog/post.html （#测试） “帖子：http://mytest.com/blog/post.html“ 它还应支持多个链接，如： “读取http://mytest.com/blog/post.html 及http://mytest

我的正则表达式很差，让我失望，所以这里有一些帮助会很好

我只想返回tweet中出现的所有链接（只是一个字符串）-一些示例如下：

“很好的总结http://mytest.com/blog/post.html （#测试）

”http://mytest.com/blog/post.html （#测试）

“帖子：http://mytest.com/blog/post.html“

它还应支持多个链接，如：

“读取http://mytest.com/blog/post.html 及http://mytest.com/blog/post_two.html“

任何帮助都会很好

谢谢

本

试试这个：

/\bhtps？：\/\/\S+\b/

更新：

要捕获也以“www.”（无“http://”前缀）开头的链接，您可以尝试以下操作：

/\b（？：https？：\/\/\www\）\S+\b/

以下是我写的一个解析推特提要的网站的代码片段。它解析链接、散列标签和twitter用户名。到目前为止，效果很好。我知道它不是Ruby，但是正则表达式应该很有用

if(tweetStream[i] != null)
                    {
                        var str = tweetStream[i].Text;
                        var re = new Regex(@"http(s)?:\/\/\S+");
                        MatchCollection mc = re.Matches(tweetStream[i].Text);

                        foreach (Match m in mc)
                        {
                            str = str.Replace(m.Value, "<a href='" + m.Value + "' target='_blank'>" + m.Value + "</a>");
                        }
                        re = new Regex(@"(@)(\w+)");
                        mc = re.Matches(tweetStream[i].Text);
                        foreach (Match m in mc)
                        {
                            str = str.Replace(m.Value, "<a href='http://twitter.com/" + m.Value.Replace("@",string.Empty) + "' target='_blank'>" + m.Value + "</a>");
                        }
                        re = new Regex(@"(#)(\w+)");
                        mc = re.Matches(tweetStream[i].Text);
                        foreach (Match m in mc)
                        {
                            str = str.Replace(m.Value, "<a href='http://twitter.com/#search?q=" + m.Value.Replace("#", "%23") + "' target='_blank'>" + m.Value + "</a>");
                        }
                        tweets += string1 + "<div>" + str + "</div>" + string2;
                    }

if（tweetStream[i]！=null）
{
var str=tweetStream[i]。文本；
var re=new Regex（@“http（s）：\/\/\s+”）；
MatchCollection mc=re.Matches（tweetStream[i].Text）；
foreach（在mc中匹配m）
{
str=str.Replace（m.Value，“”）；
}
re=新的正则表达式（@“（@）（\w+））；
mc=re.Matches（tweetStream[i].Text）；
foreach（在mc中匹配m）
{
str=str.Replace（m.Value，“”）；
}
re=新正则表达式（@“（#”）（\w+）；
mc=re.Matches（tweetStream[i].Text）；
foreach（在mc中匹配m）
{
str=str.Replace（m.Value，“”）；
}
tweets+=string1+“”+str+“”+string2；
}

找到了这个

我意识到这个问题是2009年提出的，但Twitter的API现在返回URL（并扩展t.co链接）。

这取决于你想得到的具体程度。也许发布您使用的正则表达式，以及您未捕获的案例可能会有用。我认为您可以在没有http的情况下在推文中发布链接。所以这会失败，比如“我真的很喜欢www.this-site.com。”嗯，很有趣。好评论。我更新了我的答案，以检测以“www.”开头的链接。好的，现在来看看“哇，stackoverflow.com太棒了！”：PYea对Andrei的评论：如果您要担心在没有http://的情况下启动，您应该在所有TLD之前检查非空格字符

^(?#Protocol)(?:(?:ht|f)tp(?:s?)\:\/\/|~/|/)?(?#Username:Password)(?:\w+:\w+@)?(?#Subdomains)(?:(?:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?#Port)(?::[\d]{1,5})?(?#Directories)(?:(?:(?:/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|/)+|\?|#)?(?#Query)(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?#Anchor)(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?$