Php 字词过滤器_Php_Wordpress_Preg Replace_Preg Match

Php 字词过滤器

php wordpress

Php 字词过滤器,php,wordpress,preg-replace,preg-match,Php,Wordpress,Preg Replace,Preg Match,我正在开发一个WordPress插件，用列表中随机出现的新单词替换评论中的坏单词我现在有两个数组：一个包含坏单词，另一个包含好单词 $bad = array("bad", "words", "here"); $good = array("good", "words", "here"); 因为我是初学者，所以在某个时候我被卡住了为了替换坏单词，我一直在使用$newstring=str\u replace（$bad，$good，$string）我的第一个问题是，我想关闭大小写敏感度，所以我不

我正在开发一个WordPress插件，用列表中随机出现的新单词替换评论中的坏单词

我现在有两个数组：一个包含坏单词，另一个包含好单词

$bad = array("bad", "words", "here");
$good = array("good", "words", "here");

因为我是初学者，所以在某个时候我被卡住了

为了替换坏单词，我一直在使用

$newstring=str\u replace（$bad，$good，$string）
我的第一个问题是，我想关闭大小写敏感度，所以我不会像这样写“bad”、“bad”、“bad”、“bad”、“bad”等

，但我需要新词保持原始词的格式，例如，如果我写“bad”，它将被替换为“words”，但如果我键入“bad”，它将被替换为“words”，等等

我的第一个想法是使用str_ireplace，但它忘记了原来的单词是否有大写字母

第二个问题是，我不知道如何处理这样类型的用户：“b a d”、“w o r d s”等等。我需要一个想法

为了让它选择一个随机单词，我想我可以使用

$new=$good[rand（0，count（$good）-1）]

然后

$newstring=str\u replace（$bad，$new，$string）。如果你有更好的主意，我来听你说
我的脚本的总体外观：
function noswear($string)
{
    if ($string)
    {       
        $bad = array("bad", "words");
        $good = array("good", "words"); 
        $newstring = str_replace($bad, $good, $string);     
        return $newstring;
}

echo noswear("I see bad words coming!");

提前感谢您的帮助
 我想出了这个方法，效果很好。返回true，以防条目中有不好的单词条目
例如：
由于“坏”这个词被列入黑名单，它将引起反响
编辑1:
正如rid提供的，还可以在_数组中执行简单的
检查：
function badWordsFilter($inputWord) {
  $badWords = Array("bad","words","here");
     if(in_array(strtolower($inputWord), $badWords) ) {
        return true;
     }
  return false;
}

编辑2:
正如我所承诺的，正如你在问题中提到的，我提出了用好话代替坏话的稍微不同的想法。我希望这会对你有所帮助，但这是目前我能提供的最好的，因为我完全不知道你想做什么
例如：
1.让我们将一个包含好单词和坏单词的数组合并为一个数组
$wordsTransform = array(
  'shit' => 'ship'
);

2.您的虚拟用户输入
$string = "Rolling In The Deep by Adel\n
\n
There's a fire starting in my heart\n
Reaching a fever pitch, and it's bringing me out the dark\n
Finally I can see you crystal clear\n
Go ahead and sell me out and I'll lay your shit bare";

3.用好话代替坏话
$string = strtr($string, $wordsTransform);

$wordsTransform = array(
            'shit' => 'ship'
        );

4.获得所需的输出
滚滚深渊
我的心开始燃烧

达到了发烧的程度，这让我走出了黑暗

我终于可以清楚地看到你了

来吧，把我卖了，我就把你的船暴露出来

编辑3:
为了遵循Wrikken的正确评论，我完全忘记了strtr
是区分大小写的，最好遵循单词边界。我从
中借用了以下示例，并对其进行了轻微修改
与我在第二次编辑中的想法相同，但不依赖于寄存器，它检查单词边界，并在正则表达式语法的每个字符前面放置反斜杠：
1.方法：
//
// Written by Patrick Rauchfuss
class String
{
    public static function stritr(&$string, $from, $to = NULL)
    {
        if(is_string($from))
            $string = preg_replace("/\b{$from}\b/i", $to, $string);

        else if(is_array($from))
        {
            foreach ($from as $key => $val)
                self::stritr($string, $key, $val);
        }
        return preg_quote($string); // return and add a backslash to special characters
    }
}

2.有好词和坏词的数组
$string = strtr($string, $wordsTransform);

$wordsTransform = array(
            'shit' => 'ship'
        );

3.更换
String::stritr($string, $wordsTransform);

前身
通过实现这样一个特性，您和/或您的代码可能会遇到很多漏洞（正如评论中多次指出的），仅举几个例子：
人们会添加字符来愚弄过滤器
人们会变得富有创造性（例如含沙射影）
人们会使用消极的攻击和讽刺
人们会使用句子/短语，而不仅仅是单词
你最好实现一个适度/标记系统，人们可以标记攻击性评论，然后可以由mods、用户等编辑/删除
基于这一理解，让我们继续
解决方案
鉴于你：
有一个禁止使用的单词列表$bad\u words
有一个替换词列表$good\u words
不管情况如何，都想替换脏话吗
想用随机的好词替换坏词吗
有一个正确转义的坏单词列表：请参阅
您可以非常轻松地使用PHP
spreg\u replace\u callback
函数：
$input_string = 'This Could be interesting but should it be? Perhaps this \'would\' work; or couldn\'t it?';

$bad_words  = array('could', 'would', 'should');
$good_words = array('might', 'will');

function replace_words($matches){
    global $good_words;
    return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3];
}

echo preg_replace_callback('/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i', 'replace_words', $input_string);

好的，preg\u replace\u回调
所做的就是编译一个包含所有坏单词的正则表达式模式。然后，匹配项将采用以下格式：
/(START OR WORD_BOUNDARY OR WHITE_SPACE)(BAD_WORD)(WORD_BOUNDARY OR WHITE_SPACE OR END)/i

i
修饰符使其不区分大小写，因此bad
和bad
将匹配
函数replace_words
然后获取匹配的单词及其边界（空白或空白字符），并用边界和随机好单词替换它
函数包装器
如果您要多次使用它，您也可以将其作为一个自包含函数来编写，尽管在这种情况下，您很可能希望在调用该函数时（或永久性地在其中硬编码）将好/坏单词输入到该函数中，但这取决于您如何派生它们
function clean_string($input_string, $bad_words, $good_words){
    return preg_replace_callback(
        '/(^|\b|\s)('.implode('|', $bad_words).')(\b|\s|$)/i',
        function ($matches) use ($good_words){
            return $matches[1].$good_words[rand(0, count($good_words)-1)].$matches[3];
        },
        $input_string
    );
}

echo clean_string($input_string, $bad_words, $good_words);

输出
使用第一个示例中显示的输入和单词列表连续运行上述函数：
This will be interesting but might it be? Perhaps this 'will' work; or couldn't it?
This might be interesting but might it be? Perhaps this 'might' work; or couldn't it?
This might be interesting but will it be? Perhaps this 'will' work; or couldn't it?

当然，替换词是随机选择的，所以如果我刷新了页面，我会得到其他东西。。。但这说明了什么可以/不可以被取代
请注意
转义$bad_单词
单词边界\b
在这段代码中，我使用了\b
、\s
和^
或$
作为单词边界，这是有充分理由的。虽然空白
，字符串开头
和字符串结尾
都被视为单词边界，但\b
在所有情况下都不匹配，例如：
\b\$h1t\b <---Will not match

\b\$h1t\b我看到你的未来会出现一些重大问题。防止攻击性用户输入的唯一方法是阻止所有用户输入。这很难…请检查并发布。为了重申其他的评论，文章中有一句简短的话…“我想把我的长颈鹿插在你的毛茸茸的白色兔子身上。”。人类是有创造力的，你阻止一件事，我们就会找到另一种方法。这确实是一个错误。如果你实现了一个单词过滤器，我会在你睡觉的时候找到你，把你打晕！在$inputWordstrtolower

This will be interesting but might it be? Perhaps this 'will' work; or couldn't it?
This might be interesting but might it be? Perhaps this 'might' work; or couldn't it?
This might be interesting but will it be? Perhaps this 'will' work; or couldn't it?

foreach($bad_words as $key=>$word){
    $bad_words[$key] = preg_quote($word);
}

\b\$h1t\b <---Will not match