Php 在支持Unicode字符的Vbulletin post中标记单词_Php_Unicode_Preg Replace_Vbulletin

Php 在支持Unicode字符的Vbulletin post中标记单词

php unicode

Php 在支持Unicode字符的Vbulletin post中标记单词,php,unicode,preg-replace,vbulletin,Php,Unicode,Preg Replace,Vbulletin,我有一个Vbulletin插件，它用HREF替换了所有的标签，但是它需要定制，因为它不支持非英语字符例如：#vbulletin将变成#vbulletin，但#može将变成#može，只有#mo转换为哈希标记由于我对PHP不太熟练，为了更好地理解，我将复制文件的内容 <?php $hashes = array(); do { if (!$matches = USERTAG::match(preg_replace('#\[(\w+?)(?>[^\]]*?)\](.*)(\[

我有一个Vbulletin插件，它用HREF替换了所有的标签，但是它需要定制，因为它不支持非英语字符

例如：#vbulletin将变成#vbulletin，但#može将变成#može，只有#mo转换为哈希标记

由于我对PHP不太熟练，为了更好地理解，我将复制文件的内容

<?php
$hashes = array();
do
{
    if (!$matches = USERTAG::match(preg_replace('#\[(\w+?)(?>[^\]]*?)\](.*)(\[/\1\])#siU', '', $message), 'hash'))
    {
        break;
    }

foreach ($matches as $hash)
{
    $hash = trim($hash);

    if (!$hash)
    {
        continue;
    }

    $hashes[] = htmlspecialchars_uni($hash);
}

if (!empty($hashes))
{
    $hashes = array_unique($hashes);

    if ($info['postid'])
    {
        $hashlist = USERTAG::$db->fetchAll('
            SELECT *
            FROM $usertag_hash AS hash
            WHERE hash :queryList
                AND postid = ?
                AND type = ?
        ', array(
            ':queryList' => USERTAG::$db->queryList($hashes),
            $info['postid'],
            $info['type']
        ));
        foreach ($hashlist as $results_r)
        {
            $key = array_search($results_r['hash'], $hashes);
            if ($key === false)
            {
                continue;
            }

            unset($hashes[$key]);
        }
    }

    foreach ($hashes as $key => $hash)
    {
        $hash = unhtmlspecialchars($hash);

        if (!$hash)
        {
            unset($hashes[$key]);
            continue;
        }           

        $possible = array('/\[hash]' . preg_quote($hash, '/') . '\[\/hash\]/iU', '/#' . preg_quote($hash) . '/iU');
        $message = preg_replace($possible, '[URL=' . $this->registry->options['bburl'] . '/usertag.php?do=list&action=hash&hash=' . urlencode($hash) . ']#' . $hash . '[/URL] ', $message, -1, $found);
    }

    $info['hash'] = $hashes;                        
    }
}
while (false);
?>

正如我所说，我对PHP不是很在行，所以可能我错了。我试图改变一些部分，使用我在这里或其他网站上找到的例子，但没有任何成功

我真的非常感谢任何帮助，这样我就可以用塞尔维亚拉丁字符来标记单词，如šđſčđčſŽčĆ，如果可能的话，还可以标记整个塞尔维亚西里尔字符

我的论坛上的编码是UTF-8，数据库排序是utf8\u general\u ci，塞尔维亚字母正确地显示在帖子中。我不知道这有什么关系，只是以防万一

提前谢谢

关于。

问题可能来自这一行，它定义了两种模式来处理用户标记：

$possible = array('/\[hash]' . preg_quote($hash, '/') . '\[\/hash\]/iU', '/#' . preg_quote($hash) . '/iU');

您可以删除使贪婪量词不贪婪的愚蠢修饰符U（大多数情况下完全无用），反之亦然，并添加U修饰符，使其能够处理unicode字符。所以可以这样重写：

$possible = array('~\[hash]' . preg_quote($hash, '~') . '\[/hash]~iu', '/#' . preg_quote($hash) . '/iu');

#\[(\w+)[^]]*](.*?)(\[/\1])#siu

模式：

\[（\w+？）（？>[^\]*？）\]（.*）（\[/\1\]）\siU

也可以这样重写：

$possible = array('~\[hash]' . preg_quote($hash, '~') . '\[/hash]~iu', '/#' . preg_quote($hash) . '/iu');

#\[(\w+)[^]]*](.*?)(\[/\1])#siu

不确定这是否能解决所有问题，但这至少是一个开始。

感谢您的快速响应。不幸的是，这并没有解决问题，但还是要谢谢你。