Php 突出显示4个连续匹配字

Php 突出显示4个连续匹配字,php,Php,我有两个字符串,一个是模态答案,另一个是学生给出的答案。我想从学生给出的答案中突出显示4个连续匹配的单词和模态答案 我编写了下面的函数来匹配和突出显示答案字符串中的单词 function getCopiedText($modelAnswer, $answer) { $modelAnsArr = explode(' ', $modelAnswer); $answerArr = explode(' ', $answer); $common = array_intersect(

我有两个字符串,一个是模态答案,另一个是学生给出的答案。我想从学生给出的答案中突出显示4个连续匹配的单词和模态答案

我编写了下面的函数来匹配和突出显示答案字符串中的单词

function getCopiedText($modelAnswer, $answer) {
    $modelAnsArr = explode(' ', $modelAnswer);
    $answerArr = explode(' ', $answer);
    $common = array_intersect($answerArr, $modelAnsArr);
    if (isset($common) && !empty($common)) {
        $common[max(array_keys($common)) + 2] = '';
        $count = 0;
        $word = '';
        for ($i = 0; $i <= max(array_keys($common)); $i++) {
            if (isset($common[$i])) {
                $count++;
                $word .= $common[$i] . ' ';
            } else {
                if ($count >= 4) {
                    $answer = preg_replace("@($word)@i", '<span style="color:blue">$1</span>', $answer);
                }
                $count = 0;
                $word = '';
            }
        }
    }
    return $answer;
}
函数调用

echo getCopiedText($modelAnswer, $answer);
问题: 当
$answer
字符串超过300个字符时,函数将不会返回突出显示的字符串。如果
$answer
字符串少于300个字符,则它将返回突出显示的字符串。e、 g.假设
$answer
字符串是
Lorem Ipsum只是印刷和排版行业的虚拟文本。
它返回突出显示的字符串。但不适用于超过300的字符


我不确定,但似乎
preg\u replace
功能有问题。可能图案(preg_replace的第一个参数)长度超出了限制

尽管我不能完全确定你想要的最终结果。似乎您正试图突出显示给定答案中连续匹配的任意4个单词集。以确定潜在剽窃的发生

根据您关于检索匹配的4个单词集的评论,我想提出一些优化建议

例子: 函数getCopiedText($model,$answer) { $test=爆炸(“”,$answer); while($test){ 如果(计数($test)<4){ 打破 } //从答案中检索4个连续单词并将其删除 $words=阵列拼接($test,0,4); $phrase=内爆(“”,$words); //确保在模型中找到短语 if(false!==stripos($model,$phrase)){ $answer=str_-ireplace($phrase,.$phrase.''.$answer,$answer); } } 返回$answer; } $modelAnswer='Lorem Ipsum只是印刷和排版行业的虚拟文本。自16世纪以来,Lorem Ipsum一直是业界标准的虚拟文本,当时一位不知名的印刷商拿起一个打印工具,将其拼凑成一本打印样本书。它不仅存活了五个世纪,而且还跨越到电子排版,基本上保持不变。它在20世纪60年代随着包含Lorem Ipsum段落的Letraset表单的发布而流行,最近随着Aldus PageMaker等桌面出版软件的发布(包括Lorem Ipsum版本)而流行。”; $answer='NOT IN是简单的伪文本,当一个不知名的打印机拿走一个厨房时,它不在,这是简单的伪文本'; echo getCopiedText($modelAnswer,$answer); 结果:

NOT IN <span style="color:blue">is simply dummy text</span> NOT IN <span style="color:blue">when an unknown printer</span> took a galley -this- <span style="color:blue">is simply dummy text</span>
NOT IN是简单的伪文本,当一个未知的打印机拿走一个厨房时,NOT IN是简单的伪文本

为您的原始方法提供提示


每当在PHP中将变量传递给
regex
函数时,都需要确保已使用。这将确保变量中的特殊字符(如
@
\n
\\
)被视为模式的一部分。

尽管我不能完全确定您想要的最终结果。似乎您正试图突出显示给定答案中连续匹配的任意4个单词集。以确定潜在剽窃的发生

根据您关于检索匹配的4个单词集的评论,我想提出一些优化建议

例子: 函数getCopiedText($model,$answer) { $test=爆炸(“”,$answer); while($test){ 如果(计数($test)<4){ 打破 } //从答案中检索4个连续单词并将其删除 $words=阵列拼接($test,0,4); $phrase=内爆(“”,$words); //确保在模型中找到短语 if(false!==stripos($model,$phrase)){ $answer=str_-ireplace($phrase,.$phrase.''.$answer,$answer); } } 返回$answer; } $modelAnswer='Lorem Ipsum只是印刷和排版行业的虚拟文本。自16世纪以来,Lorem Ipsum一直是业界标准的虚拟文本,当时一位不知名的印刷商拿起一个打印工具,将其拼凑成一本打印样本书。它不仅存活了五个世纪,而且还跨越到电子排版,基本上保持不变。它在20世纪60年代随着包含Lorem Ipsum段落的Letraset表单的发布而流行,最近随着Aldus PageMaker等桌面出版软件的发布(包括Lorem Ipsum版本)而流行。”; $answer='NOT IN是简单的伪文本,当一个不知名的打印机拿走一个厨房时,它不在,这是简单的伪文本'; echo getCopiedText($modelAnswer,$answer); 结果:

NOT IN <span style="color:blue">is simply dummy text</span> NOT IN <span style="color:blue">when an unknown printer</span> took a galley -this- <span style="color:blue">is simply dummy text</span>
NOT IN是简单的伪文本,当一个未知的打印机拿走一个厨房时,NOT IN是简单的伪文本

为您的原始方法提供提示


每当在PHP中将变量传递给
regex
函数时,都需要确保已使用。这将确保变量中的特殊字符(如
@
\n
\\
)被视为模式的一部分。

我添加了一个单独的答案,因为OP在后面评论说,它们确实希望匹配4个或更多单词的短语。我最初的回答是基于OP最初希望匹配4个单词短语的评论

我对原始答案进行了重构,以使用迭代来迭代每个单词,而不是只迭代4个单词。以及在每个短语中指定最小字数的功能(默认值4),处理缩短的重复短语,并在遇到部分匹配时倒带

例如:

Model: "one two three four one two three four five six seven"
Answer:
    "two three four five two three four five six seven"
Shortened Duplicate:: 
    "[two three four five] [[two three four five] six seven]"

Answer: 
    "one one two three four"
Partial Match Rewind:
    "one [one two three four]"
来源


例子: 此解决方案不区分大小写,同时考虑特殊的
@(,)
和不可打印 查拉
Model: "one two three four one two three four five six seven"
Answer:
    "two three four five two three four five six seven"
Shortened Duplicate:: 
    "[two three four five] [[two three four five] six seven]"

Answer: 
    "one one two three four"
Partial Match Rewind:
    "one [one two three four]"
function getCopiedText($model, $answer, $min = 4)
{
    //ensure there are not double spaces
    $model = str_replace('  ', ' ', $model);
    $answer = str_replace('  ', ' ', $answer);
    $test = new CachingIterator(new ArrayIterator(explode(' ', $answer)));
    $words = $matches = [];
    $p = $match = null;
    //test each word
    foreach($test as $i => $word) {
        $words[] = $word;
        $count = count($words);
        if ($count === 2) {
            //save pointer at second word
            $p = $i;
        }
        //check if the phrase of words exists in the model
        if (false !== stripos($model, $phrase = implode(' ', $words))) {
            //only match phrases with the minimum or more words
            if ($count >= $min) {
                //reset back to here for more matches
                $match = $phrase;
                if (!$test->hasNext()) {
                    //add the the last word to the phrase
                    $matches[$match] = true;
                    $p = null;
                }
            }
        } else {
            //the phrase of words was no longer found
            if (null !== $match && !isset($matches[$match])) {
                //add the matched phrase to the list of matches
                $matches[$match] = true;
                $p = null;
                $iterator = $test->getInnerIterator();
                if ($iterator->valid()) {
                    //rewind pointer back to the current word since the current word may be part of the next phrase
                    $iterator->seek($i);
                }
            } elseif (null !== $p) {
                //match not found, determine if we need to rewind the pointer
                $iterator = $test->getInnerIterator();
                if ($iterator->valid()) {
                    //rewind pointer back to second word since a partial phrase less than 4 words was matched
                    $iterator->seek($p);
                }
                $p = null;
            }
            //reset testing
            $words = [];
            $match = null;
        }
    }

    //highlight the matched phrases in the answer
    if (!empty($matches)) {
        $phrases = array_keys($matches);
        //sort phrases by the length
        array_multisort(array_map('strlen', $phrases), $phrases);

        //filter the matches as regular expression patterns
        //order by longest phrase first to ensure double highlighting of smaller phrases
        $phrases  = array_map(function($phrase) {
            return '/(' . preg_quote($phrase, '/') . ')/iu';
        }, array_reverse($phrases));

        $answer = preg_replace($phrases, '<span style="color:blue">$0</span>', $answer);
    }

    return $answer;
}
$modelAnswer = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry`s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.';

$answer ='NOT IN is simply dummy text NOT in when an unknown printer took a galley -this- is simply dummy text of the printing and typesetting industry';

echo getCopiedText($modelAnswer, $answer);