Php 注释代码的正则表达式有问题_Php_Regex_Xss_Markup_Bbcode

Php 注释代码的正则表达式有问题

php regex

Php 注释代码的正则表达式有问题,php,regex,xss,markup,bbcode,Php,Regex,Xss,Markup,Bbcode,我目前正在制作一个主页，在那里登录的用户可以写评论。注释字符串首先通过str_替换表情符号的函数运行。在那之后，我想把它换掉 [url=www.whatever.com]linktext[/url] 与：这样做的原因是，我想剥离所有不受注释代码控制的html代码的文本，以防一些用户决定进行创作- 我认为最好使用preg-replace，但我最终得到的代码（部分来自我信任的“O-reilly Sql和Php”一书中的关于reg-exp的阅读，部分来自web）非常疯狂，而且最重要的是，不起作

我目前正在制作一个主页，在那里登录的用户可以写评论。注释字符串首先通过str_替换表情符号的函数运行。在那之后，我想把它换掉

[url=www.whatever.com]linktext[/url]

与：

这样做的原因是，我想剥离所有不受注释代码控制的html代码的文本，以防一些用户决定进行创作-

我认为最好使用preg-replace，但我最终得到的代码（部分来自我信任的“O-reilly Sql和Php”一书中的关于reg-exp的阅读，部分来自web）非常疯狂，而且最重要的是，不起作用

任何帮助都将不胜感激，谢谢

可能可以交换整个代码，而不是像我所做的那样分成两段。刚决定先让两个较小的部件工作会更容易，然后再合并它们

代码：

函数文本\u格式（$string）
{
$pattern=“/（[url=）+[a-zA-Z0-9]+（]）+/”；
$string=preg_replace（$pattern，“/（）+/”，$string）；
$pattern=“/（[\/url]）+/”；
$string=preg_replace（$pattern，“/（）+/”，$string）；
返回$string；
}

我尝试了以下几点：

function text_format($string)
{
    return preg_replace('#\[url=([^\]]+)\]([^\[]*)\[/url\]#', '<a href="$1">$2</a>', $string);
}

另一种选择是使用此逻辑并将其放入回调函数中

最后，这显然是一个常见的“问题”，其他人已经解决了很多次，如果使用更成熟的开源解决方案是一种选择，我建议寻找一个。

我尝试了以下几点：

function text_format($string)
{
    return preg_replace('#\[url=([^\]]+)\]([^\[]*)\[/url\]#', '<a href="$1">$2</a>', $string);
}

另一种选择是使用此逻辑并将其放入回调函数中

最后，这显然是一个常见的“问题”，其他人已经解决了很多次，如果使用更成熟的开源解决方案是一种选择，我建议您寻找一个。

看起来您正在使用类似BBCode的东西。为什么不使用BBCode解析器，比如这个呢

它还处理笑脸，用图像代替笑脸。如果您使用他们的测试页面，您仍然会看到文本，因为他们没有托管图像，并且将alt文本设置为smily。

看起来您使用的是类似于BBCode的内容。为什么不使用BBCode解析器，比如这个呢

它还处理笑脸，用图像代替笑脸。如果你使用他们的测试页面，你仍然会看到文本，因为他们没有托管图像，并且他们将alt文本设置为smily。

@Lauri Lehtinen的回答有助于学习该技术背后的思想，但你不应该在实践中使用它，因为它会使你的站点极易受到XSS攻击。此外，链接垃圾邮件发送者会欣赏生成的链接上缺少

rel=“nofollow”

相反，请使用以下方法：

<?php
// \author Daniel Trebbien
// \date 2010-06-22
// \par License
//  Public Domain

$allowed_uri_schemes = array('http', 'https', 'ftp', 'ftps', 'irc', 'mailto');

/**
 * Encodes a string in RFC 3986
 *
 * \see http://tools.ietf.org/html/rfc3986
 */
function encode_uri($str)
{
    $str = urlencode('' . $str);
    $search = array('%3A', '%2F', '%3F', '%23', '%5B', '%5D', '%40', '%21', '%24', '%26', '%27', '%28', '%29', '%2A', '%2B', '%2C', '%3B', '%3D', '%2E', '%7E');
    $replace = array(':', '/', '?', '#', '[', ']', '@', '!', '$', '&', '\'', '(', ')', '*', '+', ',', ';', '=', '.', '~'); // gen-delims / sub-delims / unreserved
    return str_ireplace($search, $replace, $str);
}

function url_preg_replace_callback($matches)
{
    global $allowed_uri_schemes;

    if (empty($matches[1]))
        return $matches[0];
    $href = trim($matches[1]);
    if (($i = strpos($href, ':')) !== FALSE) {
        if (strrpos($href, '/', $i) === FALSE) {
            if (!in_array(strtolower(substr($href, 0, $i)), $allowed_uri_schemes))
                return $matches[0];
        }
    }

    // unescape `\]`, `\\\]`, `\\\\\]`, etc.
    for ($j = strpos($href, '\\]'); $j !== FALSE; $j = strpos($href, '\\]', $j)) {
        for ($i = $j - 2; $i >= 0 && $href[$i] == '\\' && $href[$i + 1] == '\\'; $i -= 2)
            /* empty */;
        $i += 2;

        $h = '';
        if ($i > 0)
            $h = substr($href, 0, $i);
        for ($numBackslashes = floor(($j - $i)/2); $numBackslashes > 0; --$numBackslashes)
            $h .= '\\';
        $h .= ']';
        if (($j + 2) < strlen($href))
            $h .= substr($href, $j + 2);
        $href = $h;
        $j = $i + floor(($j - $i)/2) + 1;
    }

    if (!empty($matches[2]))
        $href .= str_replace('\\\\', '\\', $matches[2]);

    if (empty($matches[3]))
        $linkText = $href;
    else {
        $linkText = trim($matches[3]);
        if (empty($linkText))
            $linkText = $href;
    }
    $href = htmlspecialchars(encode_uri(htmlspecialchars_decode($href)));
    return "<a href=\"$href\" rel=\"nofollow\">$linkText</a>";
}

function render($input)
{
    $input = htmlspecialchars(strip_tags('' . $input));
    $input = preg_replace_callback('~\[url=((?:[^\]]|(?<!\\\\)(?:\\\\\\\\)*\\\\\])*)((?<!\\\\)(?:\\\\\\\\)*)\]' . '((?:[^[]|\[(?!/)|\[/(?!u)|\[/u(?!r)|\[/ur(?!l)|\[/url(?!\]))*)' . '\[/url\]~i', 'url_preg_replace_callback', $input);
    return $input;
}

或者，只需看一眼

正如您所见，它是有效的。

@Lauri Lehtinen的回答有助于学习该技术背后的思想，但您不应该在实践中使用它，因为它会使您的站点极易受到XSS攻击。此外，链接垃圾邮件发送者会欣赏生成的链接上缺少

rel=“nofollow”

相反，请使用以下方法：

<?php
// \author Daniel Trebbien
// \date 2010-06-22
// \par License
//  Public Domain

$allowed_uri_schemes = array('http', 'https', 'ftp', 'ftps', 'irc', 'mailto');

/**
 * Encodes a string in RFC 3986
 *
 * \see http://tools.ietf.org/html/rfc3986
 */
function encode_uri($str)
{
    $str = urlencode('' . $str);
    $search = array('%3A', '%2F', '%3F', '%23', '%5B', '%5D', '%40', '%21', '%24', '%26', '%27', '%28', '%29', '%2A', '%2B', '%2C', '%3B', '%3D', '%2E', '%7E');
    $replace = array(':', '/', '?', '#', '[', ']', '@', '!', '$', '&', '\'', '(', ')', '*', '+', ',', ';', '=', '.', '~'); // gen-delims / sub-delims / unreserved
    return str_ireplace($search, $replace, $str);
}

function url_preg_replace_callback($matches)
{
    global $allowed_uri_schemes;

    if (empty($matches[1]))
        return $matches[0];
    $href = trim($matches[1]);
    if (($i = strpos($href, ':')) !== FALSE) {
        if (strrpos($href, '/', $i) === FALSE) {
            if (!in_array(strtolower(substr($href, 0, $i)), $allowed_uri_schemes))
                return $matches[0];
        }
    }

    // unescape `\]`, `\\\]`, `\\\\\]`, etc.
    for ($j = strpos($href, '\\]'); $j !== FALSE; $j = strpos($href, '\\]', $j)) {
        for ($i = $j - 2; $i >= 0 && $href[$i] == '\\' && $href[$i + 1] == '\\'; $i -= 2)
            /* empty */;
        $i += 2;

        $h = '';
        if ($i > 0)
            $h = substr($href, 0, $i);
        for ($numBackslashes = floor(($j - $i)/2); $numBackslashes > 0; --$numBackslashes)
            $h .= '\\';
        $h .= ']';
        if (($j + 2) < strlen($href))
            $h .= substr($href, $j + 2);
        $href = $h;
        $j = $i + floor(($j - $i)/2) + 1;
    }

    if (!empty($matches[2]))
        $href .= str_replace('\\\\', '\\', $matches[2]);

    if (empty($matches[3]))
        $linkText = $href;
    else {
        $linkText = trim($matches[3]);
        if (empty($linkText))
            $linkText = $href;
    }
    $href = htmlspecialchars(encode_uri(htmlspecialchars_decode($href)));
    return "<a href=\"$href\" rel=\"nofollow\">$linkText</a>";
}

function render($input)
{
    $input = htmlspecialchars(strip_tags('' . $input));
    $input = preg_replace_callback('~\[url=((?:[^\]]|(?<!\\\\)(?:\\\\\\\\)*\\\\\])*)((?<!\\\\)(?:\\\\\\\\)*)\]' . '((?:[^[]|\[(?!/)|\[/(?!u)|\[/u(?!r)|\[/ur(?!l)|\[/url(?!\]))*)' . '\[/url\]~i', 'url_preg_replace_callback', $input);
    return $input;
}

或者，只需看一眼

如您所见，它是有效的。

这完全不能回答您的问题，但您可能真的想看看现有的工具，比如（使用的格式），而不是自己滚动。我认为您无法通过[a-zA-Z0-9]匹配url，那么像-，/，&，：，#？等等……这完全不能回答你的问题，但你可能真的想看看现有的工具，比如（它使用的格式），而不是自己滚动。我认为你不能通过[a-zA-Z0-9]匹配url，那么像-，/，&，：，#，这样的字符呢？等等…非常感谢你的快速回答。Reg exp对于一个noob来说真的很让人困惑，而且可能在多年编程经验之后仍然让人困惑。看着一大堆括号，我的大脑就融化了：）你是上帝，密码就像一个符咒，非常感谢你的帮助。@Rakoon-没必要对我们抱有宗教信仰。我认为经验在这里起着很小的作用符合事实的但在绞尽脑汁一段时间后，一个棘手问题的即时而明智的答案（对于noob来说）让我有点宗教倾向：）非常感谢你的快速回答。Reg exp对于一个noob来说真的很让人困惑，而且可能在多年编程经验之后仍然让人困惑。看着一大堆括号，我的大脑就融化了：）你是上帝，密码就像一个符咒，非常感谢你的帮助。@Rakoon-没必要对我们抱有宗教信仰。我认为经验在这里起着很小的作用符合事实的但是在绞尽脑汁一段时间后，一个棘手问题（对于noob来说）的即时而明智的答案让我有点宗教倾向：）嗯。好主意，但我已经开始解析了，只需要几个简单的格式选项。笑脸是定制的，并且内置在我制作的主题系统中，因此可以拥有独立于主题的笑脸。不管怎样，谢谢你的回答。这是个好主意，但我已经开始解析了，只需要几个简单的格式化选项。笑脸是定制的，并且内置在我制作的主题系统中，因此可以拥有独立于主题的笑脸。无论如何，谢谢你的回答。你好，谢谢你的回复。这可能不是我现在写的页面的问题，因为它是一个web应用程序，用户可以在其中更改内容、添加和删除图库或博客文章。但是，用户是由超级管理员指定的人，因此用户无法注册并访问评论。如果我决定扩展模型以允许注册，这将成为一个更大的问题。O reilly的书中提到了剥离标签，我认为这是一种被称为真实的字符串，用于保留文本。人们还能写字吗

<?php
// \author Daniel Trebbien
// \date 2010-06-22
// \par License
//  Public Domain

$allowed_uri_schemes = array('http', 'https', 'ftp', 'ftps', 'irc', 'mailto');

/**
 * Encodes a string in RFC 3986
 *
 * \see http://tools.ietf.org/html/rfc3986
 */
function encode_uri($str)
{
    $str = urlencode('' . $str);
    $search = array('%3A', '%2F', '%3F', '%23', '%5B', '%5D', '%40', '%21', '%24', '%26', '%27', '%28', '%29', '%2A', '%2B', '%2C', '%3B', '%3D', '%2E', '%7E');
    $replace = array(':', '/', '?', '#', '[', ']', '@', '!', '$', '&', '\'', '(', ')', '*', '+', ',', ';', '=', '.', '~'); // gen-delims / sub-delims / unreserved
    return str_ireplace($search, $replace, $str);
}

function url_preg_replace_callback($matches)
{
    global $allowed_uri_schemes;

    if (empty($matches[1]))
        return $matches[0];
    $href = trim($matches[1]);
    if (($i = strpos($href, ':')) !== FALSE) {
        if (strrpos($href, '/', $i) === FALSE) {
            if (!in_array(strtolower(substr($href, 0, $i)), $allowed_uri_schemes))
                return $matches[0];
        }
    }

    // unescape `\]`, `\\\]`, `\\\\\]`, etc.
    for ($j = strpos($href, '\\]'); $j !== FALSE; $j = strpos($href, '\\]', $j)) {
        for ($i = $j - 2; $i >= 0 && $href[$i] == '\\' && $href[$i + 1] == '\\'; $i -= 2)
            /* empty */;
        $i += 2;

        $h = '';
        if ($i > 0)
            $h = substr($href, 0, $i);
        for ($numBackslashes = floor(($j - $i)/2); $numBackslashes > 0; --$numBackslashes)
            $h .= '\\';
        $h .= ']';
        if (($j + 2) < strlen($href))
            $h .= substr($href, $j + 2);
        $href = $h;
        $j = $i + floor(($j - $i)/2) + 1;
    }

    if (!empty($matches[2]))
        $href .= str_replace('\\\\', '\\', $matches[2]);

    if (empty($matches[3]))
        $linkText = $href;
    else {
        $linkText = trim($matches[3]);
        if (empty($linkText))
            $linkText = $href;
    }
    $href = htmlspecialchars(encode_uri(htmlspecialchars_decode($href)));
    return "<a href=\"$href\" rel=\"nofollow\">$linkText</a>";
}

function render($input)
{
    $input = htmlspecialchars(strip_tags('' . $input));
    $input = preg_replace_callback('~\[url=((?:[^\]]|(?<!\\\\)(?:\\\\\\\\)*\\\\\])*)((?<!\\\\)(?:\\\\\\\\)*)\]' . '((?:[^[]|\[(?!/)|\[/(?!u)|\[/u(?!r)|\[/ur(?!l)|\[/url(?!\]))*)' . '\[/url\]~i', 'url_preg_replace_callback', $input);
    return $input;
}

echo render('[url=http://www.bing.com/][[/[/u[/ur[/urlBing[/url]') . "\n";
echo render('[url=][/url]') . "\n";
echo render('[url=http://www.bing.com/][[/url]') . "\n";
echo render('[url=http://www.bing.com/][/[/url]') . "\n";
echo render('[url=http://www.bing.com/][/u[/url]') . "\n";
echo render('[url=http://www.bing.com/][/ur[/url]') . "\n";
echo render('[url=http://www.bing.com/][/url[/url]') . "\n";
echo render('[url=http://www.bing.com/][/url][/url]') . "\n";
echo render('[url=    javascript: window.alert("hi")]click me[/url]') . "\n";
echo render('[url=#" onclick="window.alert(\'hi\')"]click me[/url]') . "\n";
echo render('[url=http://www.bing.com/]       [/url]') . "\n";
echo render('[url=/?#[\\]@!$&\'()*+,;=.~]       [/url]') . "\n"; // link text should be `/?#[]@!$&amp;'()*+,;=.~`
echo render('[url=http://localhost/\\\\]d]abc[/url]') . "\n"; // href should be `http://localhost/%5C`, link text should be `d]abc`
echo render('[url=\\]][/url]') . "\n"; // link text should be `]`
echo render('[url=\\\\\\]][/url]') . "\n"; // link text should be `\]`
echo render('[url=\\\\\\\\\\]][/url]') . "\n"; // link text should be `\\]`
echo render('[url=a\\\\\\\\\\]bcde\\]fgh\\\\\\]ijklm][/url]') . "\n"; // link text should be `a\\]bcde]fgh\]ijklm`