PHP正则表达式来识别特定的URL模式_Php_Regex

PHP正则表达式来识别特定的URL模式

php regex

PHP正则表达式来识别特定的URL模式,php,regex,Php,Regex,我一直在尝试识别页面的URL模式。为此，我遵循了下面的内容，但最终遇到了一个问题 ->使用的PHP正则表达式： ~((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)~i 这已经确定了几乎所有类型的URL，如下所示 example.com www.example.com http://example.com http://www.example.com https://example.com https:

我一直在尝试识别页面的URL模式。为此，我遵循了下面的内容，但最终遇到了一个问题

->使用的PHP正则表达式：

~((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)~i

这已经确定了几乎所有类型的URL，如下所示

example.com
www.example.com
http://example.com
http://www.example.com    
https://example.com
https://www.example.com

但不幸的是，它还考虑了十进制值、价格值、电话号码、IP地址作为URL格式（可能我之前没有考虑过）。为了解决这个问题，我在下面找到了要排除的特定数值模式

/^[0-9]+(\.[0-9]{1,})+\S+\w?$/

通过使用此选项，可以排除如下数值，从而修复URL标识符

分贝值（1.11）

IP地址（123.123.123.123）

价格（11.11美元）

现在，新一期“缩写词也被视为URL”

W.H.O（按字母顺序排列）

那么，我如何能有一个标识PHP正则表达式的URL，它将排除上述问题案例

或

我可以用一个PHP正则表达式来识别包含缩略语的单个字母值吗，如上面的示例所示

谢谢

您可以对这些排除进行负面展望并使用

$re = '~(?x)\b                   # Word boundary
   (?!                           # Exclusion list
     [A-Z](?:\.[A-Z])+\b         # No upper and 1+ sequences of . + an upper
     |                           # or
     \d+(?:\.\d+)+\S+\b          # digits + 1+ dot and digits and 1+ non-whitespaces
   )       
   (?:https?://)?                # Optional http / https protocol part
   (?:[-\w]+\.[-\w.]+)+          # 1+ sequences of 1+ - or word chars, then . and 1+ -, ., or word chars
   \w(?::\d+)?                   # word char and 1 optional sequence of : and 1+ digits
   (?:/(?:[-\w/.]*(?:\?\S+)?)?)* # 0+ sequences of /, 0+ -, word, /, . symbols, then 1 optional sequence of ? and 1+ non-whitespaces
   \b~';                         # word boundary
$str = 'example.com  www.example.com  http://example.com http://www.example.com     https://example.com https://www.example.com  Deciaml Values (1.11)  IP Address (123.123.123.123)   W.H.O   Price values ($11.11)';
preg_match_all($re, $str, $matches);
print_r($matches[0]);

请参阅联机和。

您可以将这些排除项置于负面展望中并使用

$re = '~(?x)\b                   # Word boundary
   (?!                           # Exclusion list
     [A-Z](?:\.[A-Z])+\b         # No upper and 1+ sequences of . + an upper
     |                           # or
     \d+(?:\.\d+)+\S+\b          # digits + 1+ dot and digits and 1+ non-whitespaces
   )       
   (?:https?://)?                # Optional http / https protocol part
   (?:[-\w]+\.[-\w.]+)+          # 1+ sequences of 1+ - or word chars, then . and 1+ -, ., or word chars
   \w(?::\d+)?                   # word char and 1 optional sequence of : and 1+ digits
   (?:/(?:[-\w/.]*(?:\?\S+)?)?)* # 0+ sequences of /, 0+ -, word, /, . symbols, then 1 optional sequence of ? and 1+ non-whitespaces
   \b~';                         # word boundary
$str = 'example.com  www.example.com  http://example.com http://www.example.com     https://example.com https://www.example.com  Deciaml Values (1.11)  IP Address (123.123.123.123)   W.H.O   Price values ($11.11)';
preg_match_all($re, $str, $matches);
print_r($matches[0]);

请参阅联机链接和。

您不应发布指向虚假URL的链接。这是代码，所以您必须将其格式化（这是

{}

工具栏按钮）：请原谅我的输入错误：）请尝试不要发布指向虚假URL的链接。这是代码，所以您必须将其格式化（这是

{}

工具栏按钮）：请原谅我的拼写错误：）试试看