Php 匹配url中的特定正则词_Php_Regex_Preg Match

Php 匹配url中的特定正则词

php regex

Php 匹配url中的特定正则词,php,regex,preg-match,Php,Regex,Preg Match,我必须承认，我从未习惯使用regex，但最近我遇到了一个问题，解决这个问题比使用regex更痛苦。我需要能够匹配字符串开头符合以下模式的任何内容： {any_url_safe_word}+（”/http://“|”/https://“|

我必须承认，我从未习惯使用regex，但最近我遇到了一个问题，解决这个问题比使用regex更痛苦。我需要能够匹配字符串开头符合以下模式的任何内容：

{any_url_safe_word}

+（

”/http://“

”/https://“

）+{any word}。
因此，以下内容应匹配：

汽车/http://google.com#test
汽车/https://google.com#test
cars/www.google.com#test

以下内容不应匹配：

汽车/httdp://google.com#test
cars/http:/google.com#test

到目前为止，我尝试的是：^[\w]{1500}\/[（http:\/\/）（https:\/\/]））{0,50}
，但这与汽车的汽车/http
相匹配/httpd://google.com
此正则表达式可以执行以下操作：
<?php
$words = array(
    'cars/http://google.com#test',
    'cars/https://google.com#test',
    'cars/www.google.com#test',
    'cars/httdp://google.com#test',
    'cars/http:/google.com#test',
    'c a r s/http:/google.com#test'
    );

foreach($words as $value)
{
    /*
      \S+           - at least one non-space symbol
      \/            - slash
      (https?:\/\/) - http with possible s then ://
      |             - or
      (www\.)       - www.
      .+            - at least one symbol
     */
    if (preg_match('/^\S+\/(https?:\/\/)|(www\.).+/', $value))
    {
        print $value. " good\n";
    }
    else
    {
        print $value. " bad\n";
    }
}

^[\w\d]+\/(?:https?:\/\/)?(?:www\.)?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}

如果你想得到它后面的所有东西，你可以在末尾加上（.*）



而且，似乎URL安全词的大致列表包含abcdefghijklmnopqrstuvwxyzabefghijklmnopqrstuvxyzo123456789-。~：/？#[]@！$&()*+,;=，您也可以将其包括在内，因此您将获得（简化后）：
看看这本书
[a-z0-9-.~]+/（https？：/（www\）[a-z0-9]+\.[a-z]{2,6}（[/？#a-z0-9-.~]）*

编辑：考虑@CD001注释。如果您不介意区分大小写，请务必使用i
修饰符。
这是什么：{any\u url\u safe\u word}？例如：cars、ca\u rs、ca\u 1\u rs等。而不是“cars”。不是那么简单：一个好的regexp，仅用于域名模式匹配，来自（[a-zA-Z0-9]（？：[a-zA-Z0-9\）{a-zA-Z0-9]（？！）*[a-zA-Z0-Z0-]（？！$）{0,61}[a-zA-Z0-9]？$
正确，匹配域模式非常复杂，我刚刚选择了最简单的版本，它符合他的需要（我希望如此）嘿-是的，删除\w\d…
以获得[…]中允许的字符列表
我认为您应该很好。使用\w
的问题在于它匹配任何Perl“word”字符，并且根据运行PHP的区域设置而变化-从技术上讲，您将匹配像Ö这样的字符，这些字符不是有效的URL字符（目前）。
^[\w\d]+\/(?:https?:\/\/)?(?:www\.)?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}

^[!#$&-.0-;=?-\[\]_a-z~]+\/(?:https?:\/\/)?(?:www\.)?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}