Php 检查哪个标签与标题最相关
有两个数据库:Php 检查哪个标签与标题最相关,php,mysql,Php,Mysql,有两个数据库: +-----+----------------+ | id | tag | +----------------------+ | 1 | Audi | | 2 | BMW | | 3 | Volkswagen | | 4 | Mercedes Benz | +----------------------+ 及 我需要做的是: 1.检查哪个标签与标题最相关 2.提取最相关的标签并在标题附近
+-----+----------------+
| id | tag |
+----------------------+
| 1 | Audi |
| 2 | BMW |
| 3 | Volkswagen |
| 4 | Mercedes Benz |
+----------------------+
及
我需要做的是:
1.检查哪个标签与标题最相关
2.提取最相关的标签并在标题附近插入数据库
到目前为止我所做的:
function compareStrings($s1, $s2) {
//one is empty, so no result
if (strlen($s1)==0 || strlen($s2)==0) {
return 0;
}
//replace none alphanumeric charactors
//i left - in case its used to combine words
$s1clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s1);
$s2clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s2);
//remove double spaces
while (strpos($s1clean, " ")!==false) {
$s1clean = str_replace(" ", " ", $s1clean);
}
while (strpos($s2clean, " ")!==false) {
$s2clean = str_replace(" ", " ", $s2clean);
}
//create arrays
$ar1 = explode(" ",$s1clean);
$ar2 = explode(" ",$s2clean);
$l1 = count($ar1);
$l2 = count($ar2);
//flip the arrays if needed so ar1 is always largest.
if ($l2>$l1) {
$t = $ar2;
$ar2 = $ar1;
$ar1 = $t;
}
//flip array 2, to make the words the keys
$ar2 = array_flip($ar2);
$maxwords = max($l1, $l2);
$matches = 0;
//find matching words
foreach($ar1 as $word) {
if (array_key_exists($word, $ar2))
$matches++;
}
return ($matches / $maxwords) * 100;
}
$all_values = '';
$sql_object = "SELECT * FROM tag";
$result_object = mysql_query($sql_object);
while($row_object = mysql_fetch_array($result_object))
{
$tag = $row_object['tag'];
$sql_subject = "SELECT * FROM title ORDER BY added";
$result_subject = mysql_query($sql_subject);
while($row_subject = mysql_fetch_array($result_subject))
{
$title = $row_subject['title'];
$all_values .= "Title($title) and Tag($tag) relevancy:". compareStrings($tag, $title) . "%"."<br/>";
}
}
echo $all_values;
问题是:如何从$all_值中提取最相关的标记并插入到数据库中,因为这里我被卡住了。或者也许有更好的解决办法。如果有任何帮助,我将不胜感激。在本例中,这很简单,因为一个标签有一定的相关性,而其他标签则没有相关性。所以,这并不是一个真正具有代表性的例子,是吗?现实将更加复杂。但关键是,我需要提取并插入到db标记中,其中包含最高百分比的related.justice。这一点在本例中被巧妙地忽略了。将内容放入数组,对其进行排序,以便获得具有最高值的标记,然后将其插入数据库中。。。仅供参考,嵌套到循环中的数据库查询对性能有害。这里根本没有理由首先嵌套查询。在一个查询中读取所有标记,在另一个查询中读取标题,并将两者放入数组中。之后循环这些数组,而不是反复执行相同的数据库查询。您必须定义如何确定相关性。简单地将出现次数除以单词数可能不够,因为大多数情况下,标签在标题中只出现一次。
function compareStrings($s1, $s2) {
//one is empty, so no result
if (strlen($s1)==0 || strlen($s2)==0) {
return 0;
}
//replace none alphanumeric charactors
//i left - in case its used to combine words
$s1clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s1);
$s2clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s2);
//remove double spaces
while (strpos($s1clean, " ")!==false) {
$s1clean = str_replace(" ", " ", $s1clean);
}
while (strpos($s2clean, " ")!==false) {
$s2clean = str_replace(" ", " ", $s2clean);
}
//create arrays
$ar1 = explode(" ",$s1clean);
$ar2 = explode(" ",$s2clean);
$l1 = count($ar1);
$l2 = count($ar2);
//flip the arrays if needed so ar1 is always largest.
if ($l2>$l1) {
$t = $ar2;
$ar2 = $ar1;
$ar1 = $t;
}
//flip array 2, to make the words the keys
$ar2 = array_flip($ar2);
$maxwords = max($l1, $l2);
$matches = 0;
//find matching words
foreach($ar1 as $word) {
if (array_key_exists($word, $ar2))
$matches++;
}
return ($matches / $maxwords) * 100;
}
$all_values = '';
$sql_object = "SELECT * FROM tag";
$result_object = mysql_query($sql_object);
while($row_object = mysql_fetch_array($result_object))
{
$tag = $row_object['tag'];
$sql_subject = "SELECT * FROM title ORDER BY added";
$result_subject = mysql_query($sql_subject);
while($row_subject = mysql_fetch_array($result_subject))
{
$title = $row_subject['title'];
$all_values .= "Title($title) and Tag($tag) relevancy:". compareStrings($tag, $title) . "%"."<br/>";
}
}
echo $all_values;
Title(Audi is a great car) and Tag(Audi) relevancy:20%
Title(Audi is a great car) and Tag(BMW) relevancy:0%
Title(Audi is a great car) and Tag(Volkswagen) relevancy:0%
Title(Audi is a great car) and Tag(Mercedes Benz) relevancy:0%