Php 如何使用正则表达式过滤包含url的文本？_Php_Regex_Url_Preg Match

Php 如何使用正则表达式过滤包含url的文本？

php regex url

Php 如何使用正则表达式过滤包含url的文本？,php,regex,url,preg-match,Php,Regex,Url,Preg Match,我想过滤输入文本，如果它里面有一个URL。我所说的URL是指与有效的internet地址相对应的所有内容，如www.example.com，example.com，http://www.example.com，http://example.com/foo/bar 我想我必须使用正则表达式和preg\u match函数，因此为此我需要正确的regexp模式。如果有人能给我这个，我将不胜感激。这篇文章有一个很好的正则表达式来匹配URL：对于PHP，您需要正确地转义正则表达式，例如： $text

我想过滤输入文本，如果它里面有一个URL。我所说的URL是指与有效的internet地址相对应的所有内容，如

www.example.com

，

example.com

，

http://www.example.com

，

http://example.com/foo/bar

我想我必须使用正则表达式和

preg\u match

函数，因此为此我需要正确的regexp模式。

如果有人能给我这个，我将不胜感激。

这篇文章有一个很好的正则表达式来匹配URL：

对于PHP，您需要正确地转义正则表达式，例如：

$text = "here is some text that contains a link to www.example.com, and it will be matched.";
preg_match("/(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/", $text, $matches);
var_dump($matches);

$text=“这里有一些文本包含到www.example.com的链接，它将被匹配。”；
（a）a-z[[a-z[[[[w-[[w-]5+以下以下：：：：：：：：/{{1,3}[a-z0-9-9-0-0-10-9-9-9-9%[a-z[[a-z[a-z[[a-z[[a-z[[[[[w-[w-[w-]10-]+以下以下以下以下：：：：：：：：：：：：：：：：：：：：：：[1.3[1.1.3[1.3-3[a-3-3-3-0-0-3-3-0-9-9-9-9-9-9-9-9-3-3-3-3-3-3-3-5匹配匹配匹配匹配（以下以下以下以下以下以下以下（（（））））匹配匹配（（（（）））））））匹配（（（（））））））！（）\[\]{}；：“\”，“«»””）/“，$text，$matches）；
var_dump（$matches）；

本文有一个很好的正则表达式用于匹配URL：

对于PHP，您需要正确地转义正则表达式，例如：

$text = "here is some text that contains a link to www.example.com, and it will be matched.";
preg_match("/(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/", $text, $matches);
var_dump($matches);

$text=“这里有一些文本包含到www.example.com的链接，它将被匹配。”；
（a）a-z[[a-z[[[[w-[[w-]5+以下以下：：：：：：：：/{{1,3}[a-z0-9-9-0-0-10-9-9-9-9%[a-z[[a-z[a-z[[a-z[[a-z[[[[[w-[w-[w-]10-]+以下以下以下以下：：：：：：：：：：：：：：：：：：：：：：[1.3[1.1.3[1.3-3[a-3-3-3-0-0-3-3-0-9-9-9-9-9-9-9-9-3-3-3-3-3-3-3-5匹配匹配匹配匹配（以下以下以下以下以下以下以下（（（））））匹配匹配（（（（）））））））匹配（（（（））））））！（）\[\]{}；：“\”，“«»””）/“，$text，$matches）；
var_dump（$matches）；

$html=”http://www.scroogle.org
http://www.scroogle.org/
http://www.scroogle.org/index.html
http://www.scroogle.org/index.html?source=library
您可以在以下位置匿名上网：https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi.";
preg\u match\u all（'/\b（（？Phttps？| ftp）：\/\/（？P[-A-Z0-9.]+）（？P\/[-A-Z0-9+&/%=~（u124;）！：，.；]*）（？P\？[A-Z0-9+&~（35;）/%=~（u124；！：，.*））/i'，$html，$url，preg模式顺序）；
$URL=$URL[1][0]；

将匹配：
http://www.scroogle.org
http://www.scroogle.org/
http://www.scroogle.org/index.html
http://www.scroogle.org/index.html?source=library
您可以在https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi
要循环结果，可以使用：

for ($i = 0; $i < count($urls[0]); $i++) { echo $urls[1][$i]."\n"; }

for（$i=0；$i
将输出：干杯，Lob $html=”http://www.scroogle.org http://www.scroogle.org/ http://www.scroogle.org/index.html http://www.scroogle.org/index.html?source=library 您可以在以下位置匿名上网：https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi."; preg\u match\u all（'/\b（（？Phttps？| ftp）：\/\/（？P[-A-Z0-9.]+）（？P\/[-A-Z0-9+&/%=~（u124;）！：，.；]*）（？P\？[A-Z0-9+&~（35;）/%=~（u124；！：，.*））/i'，$html，$url，preg模式顺序）； $URL=$URL[1][0]；将匹配： http://www.scroogle.org http://www.scroogle.org/ http://www.scroogle.org/index.html http://www.scroogle.org/index.html?source=library 您可以在https://ssl.scroogle.org/cgi-bin/nbbwssl.cgi 要循环结果，可以使用： for ($i = 0; $i < count($urls[0]); $i++) { echo $urls[1][$i]."\n"; } for（$i=0；$i 将输出：干杯，高球在这里找到：来自WordPress的函数 function _make_url_clickable_cb($matches) { $ret = ''; $url = $matches[2]; if ( empty($url) ) return $matches[0]; // removed trailing [.,;:] from URL if ( in_array(substr($url, -1), array('.', ',', ';', ':')) === true ) { $ret = substr($url, -1); $url = substr($url, 0, strlen($url)-1); } return $matches[1] . "<a href=\"$url\" rel=\"nofollow\">$url</a>" . $ret; } function _make_web_ftp_clickable_cb($matches) { $ret = ''; $dest = $matches[2]; $dest = 'http://' . $dest; if ( empty($dest) ) return $matches[0]; // removed trailing [,;:] from URL if ( in_array(substr($dest, -1), array('.', ',', ';', ':')) === true ) { $ret = substr($dest, -1); $dest = substr($dest, 0, strlen($dest)-1); } return $matches[1] . "<a href=\"$dest\" rel=\"nofollow\">$dest</a>" . $ret; } function _make_email_clickable_cb($matches) { $email = $matches[2] . '@' . $matches[3]; return $matches[1] . "<a href=\"mailto:$email\">$email</a>"; } function make_clickable($ret) { $ret = ' ' . $ret; // in testing, using arrays here was found to be faster $ret = preg_replace_callback('#([\s>])([\w]+?://[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]*)#is', '_make_url_clickable_cb', $ret); $ret = preg_replace_callback('#([\s>])((www|ftp)\.[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]*)#is', '_make_web_ftp_clickable_cb', $ret); $ret = preg_replace_callback('#([\s>])([.0-9a-z_+-]+)@(([0-9a-z-]+\.)+[0-9a-z]{2,})#i', '_make_email_clickable_cb', $ret); // this one is not in an array because we need it to run last, for cleanup of accidental links within links $ret = preg_replace("#(<a( [^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i", "$1$3</a>", $ret); $ret = trim($ret); return $ret; } 可在此处找到：来自WordPress的函数 function _make_url_clickable_cb($matches) { $ret = ''; $url = $matches[2]; if ( empty($url) ) return $matches[0]; // removed trailing [.,;:] from URL if ( in_array(substr($url, -1), array('.', ',', ';', ':')) === true ) { $ret = substr($url, -1); $url = substr($url, 0, strlen($url)-1); } return $matches[1] . "<a href=\"$url\" rel=\"nofollow\">$url</a>" . $ret; } function _make_web_ftp_clickable_cb($matches) { $ret = ''; $dest = $matches[2]; $dest = 'http://' . $dest; if ( empty($dest) ) return $matches[0]; // removed trailing [,;:] from URL if ( in_array(substr($dest, -1), array('.', ',', ';', ':')) === true ) { $ret = substr($dest, -1); $dest = substr($dest, 0, strlen($dest)-1); } return $matches[1] . "<a href=\"$dest\" rel=\"nofollow\">$dest</a>" . $ret; } function _make_email_clickable_cb($matches) { $email = $matches[2] . '@' . $matches[3]; return $matches[1] . "<a href=\"mailto:$email\">$email</a>"; } function make_clickable($ret) { $ret = ' ' . $ret; // in testing, using arrays here was found to be faster $ret = preg_replace_callback('#([\s>])([\w]+?://[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]*)#is', '_make_url_clickable_cb', $ret); $ret = preg_replace_callback('#([\s>])((www|ftp)\.[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]*)#is', '_make_web_ftp_clickable_cb', $ret); $ret = preg_replace_callback('#([\s>])([.0-9a-z_+-]+)@(([0-9a-z-]+\.)+[0-9a-z]{2,})#i', '_make_email_clickable_cb', $ret); // this one is not in an array because we need it to run last, for cleanup of accidental links within links $ret = preg_replace("#(<a( [^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i", "$1$3</a>", $ret); $ret = trim($ret); return $ret; } 是的，很好，但对我来说不起作用。这导致了一个错误。我编辑了我的答案，以包含一个针对PHP的正则表达式转义示例。是的，它很好，但对我来说不起作用。这导致了一个错误。我编辑了我的答案，以包含一个针对PHP转义的正则表达式示例。此链接可能有助于编写正则表达式可能重复的：可能重复的：按过滤器你是什么意思？删除所有其他内容，或者如果不匹配，则删除所有内容？有多少个url可以作为输入的一部分，只有一个？通过过滤，我的意思是，只查找包含url的文本，并防止其存储在数据库中。此链接可能有助于编写正则表达式可能重复的：可能重复的：通过过滤你的意思是什么？删除所有其他内容，或者如果不匹配，则删除所有内容？有多少个url可以作为输入的一部分，只有一个？通过过滤，我的意思是，只需查找包含url的文本，并防止其存储在DB中。