PHP-搜索字符串中的关键字并提高提取关键字的质量和准确性
我有一段PHP代码,如下所示:PHP-搜索字符串中的关键字并提高提取关键字的质量和准确性,php,search,Php,Search,我有一段PHP代码,如下所示: $Keywords = array( ', JOE.' => '1', ', JOE' => '2', 'JOE' => '3', 'JOE.' => '4', '/JOE' => '5', '/JOE/' => '6', 'JOE/.' => '7',
$Keywords = array(
', JOE.' => '1',
', JOE' => '2',
'JOE' => '3',
'JOE.' => '4',
'/JOE' => '5',
'/JOE/' => '6',
'JOE/.' => '7',
',JOE.' => '8'
);
$Text = "JOE is JOE is JOE is JOE is JOE is JOE is JOE. Hello , JOE. Hey ,JOE. Come on , JOE. Dude,JOE/. Shut up ,JOE. What is the meaning of /JOE/? Of course, JOE";
extract_keyword ($Keywords, $Text);
function extract_keyword ($Keywords, $Text){
mb_internal_encoding('UTF-8');
uksort($Keywords, function ($a, $b) {
$as = mb_strlen($a);
$bs = mb_strlen($b);
if ($as > $bs) {
return -1;
}
else if ($bs > $as) {
return 1;
}
return 0;
});
$Keywords_ci = array();
foreach ($Keywords as $k => $v) {
$Keywords_ci[$k] = $v;
}
$re = '/\b(?:' . join('|', array_map(function($keyword) {
return preg_quote($keyword, '/');
}, array_keys($Keywords))) . ')\b/i';
$KeywordArrayKey = array();
$KeywordArrayValue = array();
$NewArray = array();
preg_match_all($re, $Text, $matches);
foreach ($matches[0] as $keyword) {
$KeywordArrayKey[] = $keyword;
$KeywordArrayValue[] = $Keywords_ci[$keyword];
if(!empty($keyword) && !empty($Keywords_ci[$keyword])) {
$NewArray[] = array($keyword => $Keywords_ci[$keyword]);
}
}
print_r($NewArray) ."<br><br>";
}
正如您所看到的,问题在于代码不够精确,无法提取存在关键字的$keywords
,例如',JOE'=>'1'或'JOE/'=>'7'
。事实上,我的目标是将'/JOE'=>'5'
与'/JOE/'=>'6'
或'JOE'=>'4'
等完全分开。请看一下代码,让我知道如何提高提取关键字的质量/准确性?谢谢你的帮助。
注1:print\r($Keywords\u ci)
打印数组([,JOE.]=>1[JOE/]=>7[,JOE.]=>8[,JOE]=>2[JOE/]=>6[JOE.]=>4[JOE]=>5[JOE]=>3)
,但我要寻找的是在$Text
中回显可用关键字的所有实例,例如'/JOE/'=>6'
或','=>8'
。
注2:以下是打印($NewArray)
的预期打印:
将关键字从最长到最短排序后,您知道将在该字符串的任何可能子集(/JOE/before/JOE)之前检查该字符串。因此,您可以使用
str_replace
删除实际匹配项,因此在搜索/JOE时不匹配/JOE/(假设您以前搜索过/JOE/)。使用的count
参数获取匹配项目的计数
<?php
$Keywords = array(
', JOE.' => '1',
', JOE' => '2',
'JOE' => '3',
'JOE.' => '4',
'/JOE' => '5',
'/JOE/' => '6',
'JOE/.' => '7',
',JOE.' => '8'
);
$Text = "JOE is JOE. Hello , JOE. Hey ,JOE. Come on , JOE. Dude,JOE/. Shut up ,JOE. What is the meaning of /JOE/? Of course, JOE";
uksort($Keywords, function ($a, $b) {
$as = mb_strlen($a);
$bs = mb_strlen($b);
if ($as > $bs) {
return -1;
}
else if ($bs > $as) {
return 1;
}
return 0;
});
$copy = $Text;
foreach ($Keywords as $keyword => $value) {
$copy = str_replace($keyword, '', $copy, $count);
if ($count > 0) {
$result[$keyword] = $value;
}
}
print_r($result);
谢谢你的回答。当我回显打印时($Keywords\u ci)
在当前代码中,我得到的结果与您的print\r($result)相同代码>。然而,我要关注的是,为了根据可用关键字的数量提高匹配的质量,@Apiah您可以添加预期的输出吗?不要重新发明轮子。看看Elasticsearch和Solr。它们都是从Lucene派生出来的,但从PHP使用起来要容易得多。@LaurynasTretjakovas,谢谢你的评论,不幸的是,我不是一名专业程序员,你能给我一个如何在我的问题中使用Elasticsearch或Solr的例子吗?
Array (
[0] => Array ( [JOE] => 3 )
[1] => Array ( [JOE] => 3 )
[2] => Array ( [JOE] => 3 )
[3] => Array ( [JOE] => 3 )
[4] => Array ( [JOE] => 3 )
[5] => Array ( [JOE] => 3 )
[6] => Array ( [JOE.] => 4 )
[7] => Array ( [, JOE.] => 1 )
[8] => Array ( [,JOE.] => 8 )
[9] => Array ( [, JOE.] => 1 )
[10] => Array ( [JOE/.] => 7 )
[11] => Array ( [,JOE.] => 8 )
[12] => Array ( [/JOE/] => 6 )
[13] => Array ( [, JOE] => 2 ) )
<?php
$Keywords = array(
', JOE.' => '1',
', JOE' => '2',
'JOE' => '3',
'JOE.' => '4',
'/JOE' => '5',
'/JOE/' => '6',
'JOE/.' => '7',
',JOE.' => '8'
);
$Text = "JOE is JOE. Hello , JOE. Hey ,JOE. Come on , JOE. Dude,JOE/. Shut up ,JOE. What is the meaning of /JOE/? Of course, JOE";
uksort($Keywords, function ($a, $b) {
$as = mb_strlen($a);
$bs = mb_strlen($b);
if ($as > $bs) {
return -1;
}
else if ($bs > $as) {
return 1;
}
return 0;
});
$copy = $Text;
foreach ($Keywords as $keyword => $value) {
$copy = str_replace($keyword, '', $copy, $count);
if ($count > 0) {
$result[$keyword] = $value;
}
}
print_r($result);