Ruby 简单的URL清理_Ruby_Regex - Fatal编程技术网

Ruby 简单的URL清理

ruby regex

Ruby 简单的URL清理,ruby,regex,Ruby,Regex,我正在尝试做一些基本的url清理，以便 www.google.com www.google.com/ http://google.com http://google.com/ https://google.com https://google.com/ 替换为http://www.google.com（或https://www.google.com如果https://在开头）基本上，我想在一个regexp中检查开头是否有http/https，结尾是否有我试着这样做： ”https://g

我正在尝试做一些基本的url清理，以便

www.google.com
www.google.com/
http://google.com
http://google.com/
https://google.com
https://google.com/

替换为

http://www.google.com

（或

https://www.google.com

如果

https://

在开头）

基本上，我想在一个regexp中检查开头是否有

http/https

，结尾是否有

我试着这样做：

”https://google.com“.match（/^（http:\/\/；https:\/\/）（.*）（\/）*$/）

在本例中，我得到：

=>#

这很好

不幸的是：

”https://google.com/“.match（/^（http:\/\/；https:\/\/）（.*）（\/）*$/）

我得到：

=>#

并且想要

2:“google.com”3:“/”

你知道怎么做吗？

如果你发现了错误，这是显而易见的；）

你在尝试：

^(http:\/\/|https:\/\/)(.*)(\/)*$

答案是使用：

^(http:\/\/|https:\/\/)(.*?)(\/)*$

这使得操作符“非贪婪”，因此尾随正斜杠不会被“.”操作符吞没

编辑：

事实上，您应该真正使用：

^(http:\/\/|https:\/\/)?(www\.)?(.*?)(\/)*$

这样，您还可以匹配前两个示例，其中没有“http（s）：/”。您也在拆分“www”部分的价值/存在。在行动中：

编辑2：

我很无聊，想完善这个：p

给你：

^(https?:\/\/)?(?:www\.)?(.*?)\/?$

现在，您需要做的就是将您的网站替换为第一个匹配项（或“http://”，如果为零），然后是“www.”，然后是第二个匹配项

在行动中：

（18个月后）编辑：

看看我的绝妙红宝石，这将有助于解决你的问题

顺便问一下，你是如何用额外的空白来处理最后一个url的？好问题，谢谢。我会努力的，这正是我想要的。谢谢

/(https?:\/\/)?(?:www\.)?google\.com\/?/.examples # => 
  ["google.com",
   "google.com/",
   "www.google.com",
   "www.google.com/",
   "http://google.com",
   "http://google.com/",
   "http://www.google.com",
   "http://www.google.com/",
   "https://google.com",
   "https://google.com/",
   "https://www.google.com",
   "https://www.google.com/"]

/(https?:\/\/)?(?:www\.)?google\.com\/?/.examples.map(&:subgroups) # =>
  [[],
   [],
   [],
   [],
   ["http://"],
   ["http://"],
   ["http://"],
   ["http://"],
   ["https://"],
   ["https://"],
   ["https://"],
   ["https://"]]