PHP-智能、容错字符串比较_Php_String_Comparison

PHP-智能、容错字符串比较

php string

PHP-智能、容错字符串比较,php,string,comparison,Php,String,Comparison,我正在寻找例程或方法来寻找容错字符串比较比方说，我们有一个测试字符串Čakánka——是的，它包含CE字符现在，我想接受以下任何字符串作为OK：卡坎卡卡坎卡 ČaKaNKA 卡坎卡卡肯卡卡安卡卡卡卡纳问题是，我经常在word中切换字母，我想最大限度地减少用户对无法（即您很忙）正确书写一个单词的沮丧所以，我知道如何进行ci比较（只需将其设置为小写：），我可以删除CE字符，但我不能容忍几个切换字符此外，您不仅经常将一个字符放在错误的位置（character=>cahracte

我正在寻找例程或方法来寻找容错字符串比较

比方说，我们有一个测试字符串

Čakánka

——是的，它包含CE字符

现在，我想接受以下任何字符串作为

OK

：

卡坎卡
卡坎卡
ČaKaNKA
卡坎卡
卡肯卡
卡安卡
卡卡卡纳

问题是，我经常在word中切换字母，我想最大限度地减少用户对无法（即您很忙）正确书写一个单词的沮丧

所以，我知道如何进行ci比较（只需将其设置为小写：），我可以删除CE字符，但我不能容忍几个切换字符

此外，您不仅经常将一个字符放在错误的位置（

character

cahracter

），有时还会将其移动多个位置（

character

carahcter

），这仅仅是因为一个手指在书写过程中很懒

谢谢：

拼写检查器执行类似的操作。也许您可以根据该引用修改算法。或者从开源项目中获取拼写检查猜测代码，例如。

不确定（特别是关于重音/特殊字符的东西，您可能必须首先处理），但对于错误位置或丢失的字符，使用函数计算两个字符串之间的，可能会帮助您（引用）：
Levenshtein距离定义为所需的最小字符数必须替换、插入或删除以将str1转化为str2

其他可能有用的函数可以是，或

这些函数手册页上的一些用户注释，特别是，可能也会给您带来一些有用的东西；-）
您可以将单词音译为拉丁字符，并使用语音算法，如从您的单词中提取精华，并将其与现有单词进行比较。在您的情况下，除了最后一个单词是
C250
，您所有的单词都是
C252

Edit类似于
levenshtein
或
similor\u text
的比较函数的问题是，您需要为每对输入值和可能的匹配值调用它们。这意味着，如果您有一个包含100万条目的数据库，则需要调用这些函数100万次
但是像
soundex
或
metaphone
这样计算某种摘要的函数可以帮助减少实际比较的次数。如果存储数据库中每个已知单词的
soundex
或
metaphone
值，则可以很快减少可能的匹配数。稍后，当可能的匹配值集减少时，可以使用比较函数获得最佳匹配
下面是一个例子：

// building the index that represents your database $knownWords = array('Čakánka', 'Cakaka'); $index = array(); foreach ($knownWords as $key => $word) { $code = soundex(iconv('utf-8', 'us-ascii//TRANSLIT', $word)); if (!isset($index[$code])) { $index[$code] = array(); } $index[$code][] = $key; } // test words $testWords = array('cakanka', 'cákanká', 'ČaKaNKA', 'CAKANKA', 'CAAKNKA', 'CKAANKA', 'cakakNa'); echo '<ul>'; foreach ($testWords as $word) { $code = soundex(iconv('utf-8', 'us-ascii//TRANSLIT', $word)); if (isset($index[$code])) { echo '<li> '.$word.' is similar to: '; $matches = array(); foreach ($index[$code] as $key) { similar_text(strtolower($word), strtolower($knownWords[$key]), $percentage); $matches[$knownWords[$key]] = $percentage; } arsort($matches); echo '<ul>'; foreach ($matches as $match => $percentage) { echo '<li>'.$match.' ('.$percentage.'%)</li>'; } echo '</ul></li>'; } else { echo '<li>no match found for '.$word.'</li>'; } } echo '</ul>';

//构建表示数据库的索引 $knownWords=数组（'chakánka'，'Cakaka'）； $index=array（）； foreach（$key=>$word的已知单词）{ $code=soundex（iconv（'utf-8'，'usascii//translatit'，$word））；如果（！isset（$index[$code]））{ $index[$code]=array（）； } $index[$code][]=$key； } //测试词 $testWords=array（'cakanka'，'cákanka'，'cakanka'，'CAAKNKA'，'CAAKNKA'，'CKAANKA'，'cakakNa'）；回声“”； foreach（$testWords作为$word）{ $code=soundex（iconv（'utf-8'，'usascii//translatit'，$word））； if（isset（$index[$code]））{ 回声“”.$word.“类似于：”； $matches=array（）； foreach（$index[$code]作为$key）{ 相似文本（strtolower（$word），strtolower（$knownWords[$key]），百分比$； $matches[$knownWords[$key]]=$percentage； } 阿索特（比赛）；回声“”； foreach（$匹配为$匹配=>$百分比）{ 回显“”.$match.”（“.$percentage.”%）”； } 回音“”； }否则{ 回显“未找到“$word”的匹配项。”； } } 回声“”；
重音不是问题，我要做的第一件事是
大写
字符串，然后用它的非重音版本替换重音字符（
ž
=>
z
）我可能会检查你，其中一个函数会很有帮助，我100%肯定。出于好奇，当你说“其中一个函数”，你到底在想哪一个？levenshtein一个，还是另一个？我可能会选择
类似的文本
-我需要检查名称（
这很有趣，但对我的需要来说可能太模糊了。谢谢。 // building the index that represents your database $knownWords = array('Čakánka', 'Cakaka'); $index = array(); foreach ($knownWords as $key => $word) { $code = soundex(iconv('utf-8', 'us-ascii//TRANSLIT', $word)); if (!isset($index[$code])) { $index[$code] = array(); } $index[$code][] = $key; } // test words $testWords = array('cakanka', 'cákanká', 'ČaKaNKA', 'CAKANKA', 'CAAKNKA', 'CKAANKA', 'cakakNa'); echo '<ul>'; foreach ($testWords as $word) { $code = soundex(iconv('utf-8', 'us-ascii//TRANSLIT', $word)); if (isset($index[$code])) { echo '<li> '.$word.' is similar to: '; $matches = array(); foreach ($index[$code] as $key) { similar_text(strtolower($word), strtolower($knownWords[$key]), $percentage); $matches[$knownWords[$key]] = $percentage; } arsort($matches); echo '<ul>'; foreach ($matches as $match => $percentage) { echo '<li>'.$match.' ('.$percentage.'%)</li>'; } echo '</ul></li>'; } else { echo '<li>no match found for '.$word.'</li>'; } } echo '</ul>';