从PHP中的文本中提取URL_Php_Html_Regex

从PHP中的文本中提取URL

php html regex

从PHP中的文本中提取URL,php,html,regex,Php,Html,Regex,我有以下案文： $string = "this is my friend's website http://example.com I think it is coll"; 如何将链接提取到另一个变量中我知道应该使用正则表达式，尤其是preg_match（），但我不知道如何使用 preg_match_all('/[a-z]+:\/\/\S+/', $string, $matches); 这是一种简单的方法，适用于很多情况，而不是所有情况。所有匹配项都放在$matches中。请注意，这不包括

我有以下案文：

$string = "this is my friend's website http://example.com I think it is coll";

如何将链接提取到另一个变量中

我知道应该使用正则表达式，尤其是

preg_match（）

，但我不知道如何使用

preg_match_all('/[a-z]+:\/\/\S+/', $string, $matches);

这是一种简单的方法，适用于很多情况，而不是所有情况。所有匹配项都放在$matches中。请注意，这不包括锚元素中的链接（URL有一个相当大的限制-您必须首先确定要捕获的内容。捕获任何以

http://

和

https://

开头的内容的简单示例可以是：

preg_match_all('!https?://\S+!', $string, $matches);
$all_urls = $matches[0];

请注意，这是非常基本的，可能会捕获无效的URL。我建议您关注更复杂的内容。

如果您从中提取URL的文本是用户提交的，并且您要将结果显示为任何位置的链接，您必须非常非常小心地避免，尤其是“javascript:”协议URL，但也可能欺骗您的regexp和/或显示浏览器将其作为Javascript URL执行。至少，您应该只接受以“http”、“https”或“ftp”开头的URL

Jeff还介绍了提取URL的一些其他问题。

可能最安全的方法是使用WordPress中的代码片段。下载最新版本（目前为3.1.1）请参阅wp includes/formatting.php。有一个名为make_clickable的函数，该函数的参数为纯文本，返回格式化字符串。您可以获取用于提取URL的代码。不过它相当复杂

这一行正则表达式可能会有所帮助

preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $string, $match);

preg\u match\u all（'.\bhttps？：/[^\s（）]+（？：\（[\w\d]+\）|（[^[：punct:]\s]./）|'，$string，$match）；

但是这个正则表达式仍然无法删除一些格式错误的URL（例如

http://google:ha.ckers.org

）

另见：

preg\u match\u all（“/a[\s]+[^>]*？href[\s]？=[\s\“\']+”。
"(.*?)[\"\']+.*?>"."([^我尝试按照Nobu所说的那样使用Wordpress，但由于对其他Wordpress函数的依赖性很大，我选择使用Nobu的正则表达式，并将其转化为一个函数，使用；一个现在用可点击链接替换文本中所有链接的函数。它使用的是PHP 5.3，因此您需要PHP 5.3，或者您可以重写代码以使用普通的f取而代之的是一种功能
<?php 

/**
 * Make clickable links from URLs in text.
 */

function make_clickable($text) {
    $regex = '#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#';
    return preg_replace_callback($regex, function ($matches) {
        return "<a href=\'{$matches[0]}\'>{$matches[0]}</a>";
    }, $text);
}

你可以这样做
<?php
$string = "this is my friend's website http://example.com I think it is coll";
echo explode(' ',strstr($string,'http://'))[0]; //"prints" http://example.com

对我有用的代码（特别是如果您的$string中有多个链接）是：
$string=“这是我朋友的网站https://www.example.com 我觉得很酷，但这一个更酷https://www.stackoverflow.com :)";
$regex='/\b（https？| ftp |文件）：\/\/[-A-Z0-9+&@#\/%？=~~|$！：，.；]*[A-Z0-9+&~#\/%=~|$]/i'；
preg_match_all（$regex，$string，$matches）；
$URL=$matches[0]；
//浏览所有链接
foreach（$url作为$url）
{
回显$url。“
”；
}

希望这对其他人也有帮助。
您可以尝试此方法查找链接并修改链接（添加href链接）
$reg|u exUrl=“/（http | https | ftp | ftps）\：\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}（\/\S*）？/”；
//要筛选URL的文本
$text=“要筛选的文本位于此处。http://example.com";
if（preg_match（$reg_exUrl，$text，$url））{
echo preg_替换（$reg_exUrl，“，$text）；
}否则{
回显“文本中没有url”；
}

请参阅此处：
这个正则表达式对我来说非常有用，我已经检查了所有类型的URL
<?php
$string = "Thisregexfindurlhttp://www.rubular.com/r/bFHobduQ3n mixedwithstring";
preg_match_all('/(https?|ssh|ftp):\/\/[^\s"]+/', $string, $url);
$all_url = $url[0]; // Returns Array Of all Found URL's
$one_url = $url[0][0]; // Gives the First URL in Array of URL's
?>

公共功能查找链接（$post\u内容）{
$reg|u exUrl=“/（http | https | ftp | ftps）\：\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}（\/\S*）？/”；
//检查文本中是否有url
if（preg_match_all（$reg_exUrl、$post_content、$url））{
//使URL超链接，
foreach（$url[0]作为$url）{
$post_content=str_replace（$url，，$post_content）；
}
//var_dump（$post_content）；die（）；//取消注释以查看结果
//返回带有超链接的文本
返回$post_内容；
}否则{
//如果文本中没有URL，请返回文本
返回$post_内容；
}
}
url有很多边缘情况。比如url可能包含方括号或不包含协议等。这就是为什么regex是不够的
我创建了一个PHP库，可以处理很多边缘情况：
例如：
这是我使用的一个函数，我记不起它是从哪里来的，但似乎在文本中找到链接并制作链接方面做得很好
您可以更改函数以满足您的需要。我只是想分享这个，因为我环顾四周，记得我在我的一个助手库中有这个
function make_links($str){

  $pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';

  return preg_replace_callback("#$pattern#i", function($matches) {
    $input = $matches[0];
    $url = preg_match('!^https?://!i', $input) ? $input : "http://$input";
    return '<a href="' . $url . '" rel="nofollow" target="_blank">' . "$input</a>";
  }, $str);
} 

输出
this is a link <a href="http://google:ha.ckers.org" rel="nofollow" target="_blank">http://google:ha.ckers.org</a> maybe don't want to visit it?

这是一个链接，可能不想访问它？
-1:您刚刚创建了一个XSS漏洞，因为它也会提取javascript:URL。没有说明他将使用它做什么，因此我不解释。他只是想将URL放入变量中。@Michael:查找javascript URL还不是一个漏洞；使用它们时不需要任何检查。有时，存在和数量的uch URL是有用的信息。我会选择不同的分隔符。：）我玩过Wordpress formatting.php，使用make_clickable是一个不错的主意，但它最终在依赖项中吸收了Wordpress的一半。好的一个，确保终端部分不是一个奇怪的字符这不会在没有http的情况下识别url，就像google.com这个正则表达式将匹配“@https？：\/（www\）？[-a-zA-Z0-9\@:%。\u\+~\ 35;=]{1256}\[a-zA-Z0-9（）]{1,6}\b（[-a-zA-Z0-9（）\@:%\+.~\？&/=]*）@“别忘了我在哪里找到的，所以不能给我比这更好的信用（上下文WordPress）。可能重复的@Michael Berkowski将如何重复用户于2009年5月26日14:13询问，但您在2010年12月8日17:44询问的链接可能与此相反。请注意：我已更新您的答案，将匿名函数用作回调函数，而不是使用create_function（）。我已经测试了所有答案，只有一个答案会删除html选项卡
<?php
$string = "Thisregexfindurlhttp://www.rubular.com/r/bFHobduQ3n mixedwithstring";
preg_match_all('/(https?|ssh|ftp):\/\/[^\s"]+/', $string, $url);
$all_url = $url[0]; // Returns Array Of all Found URL's
$one_url = $url[0][0]; // Gives the First URL in Array of URL's
?>

public function find_links($post_content){
    $reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
    // Check if there is a url in the text
    if(preg_match_all($reg_exUrl, $post_content, $urls)) {
        // make the urls hyper links,
        foreach($urls[0] as $url){
            $post_content = str_replace($url, '<a href="'.$url.'" rel="nofollow"> LINK </a>', $post_content);
        }
        //var_dump($post_content);die(); //uncomment to see result
        //return text with hyper links
        return $post_content;
    } else {
        // if no urls in the text just return the text
        return $post_content; 
    }
}

function make_links($str){

  $pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';

  return preg_replace_callback("#$pattern#i", function($matches) {
    $input = $matches[0];
    $url = preg_match('!^https?://!i', $input) ? $input : "http://$input";
    return '<a href="' . $url . '" rel="nofollow" target="_blank">' . "$input</a>";
  }, $str);
} 

$subject = 'this is a link http://google:ha.ckers.org maybe don't want to visit it?';
echo make_links($subject);

this is a link <a href="http://google:ha.ckers.org" rel="nofollow" target="_blank">http://google:ha.ckers.org</a> maybe don't want to visit it?

<?php
preg_match_all('/(href|src)[\s]?=[\s\"\']?+(.*?)[\s\"\']+.*?/', $webpage_content, $link_extracted);