Php 检查哪个标签与标题最相关

Php 检查哪个标签与标题最相关,php,mysql,Php,Mysql,有两个数据库: +-----+----------------+ | id | tag | +----------------------+ | 1 | Audi | | 2 | BMW | | 3 | Volkswagen | | 4 | Mercedes Benz | +----------------------+ 及 我需要做的是: 1.检查哪个标签与标题最相关 2.提取最相关的标签并在标题附近

有两个数据库:

+-----+----------------+
| id  |     tag        |
+----------------------+
|  1  |      Audi      |
|  2  |      BMW       |
|  3  |  Volkswagen    |
|  4  |  Mercedes Benz |
+----------------------+

我需要做的是:

1.检查哪个标签与标题最相关

2.提取最相关的标签并在标题附近插入数据库

到目前为止我所做的:

function compareStrings($s1, $s2) {
    //one is empty, so no result
    if (strlen($s1)==0 || strlen($s2)==0) {
        return 0;
    }

    //replace none alphanumeric charactors
    //i left - in case its used to combine words
    $s1clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s1);
    $s2clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s2);

    //remove double spaces
    while (strpos($s1clean, "  ")!==false) {
        $s1clean = str_replace("  ", " ", $s1clean);
    }
    while (strpos($s2clean, "  ")!==false) {
        $s2clean = str_replace("  ", " ", $s2clean);
    }

    //create arrays
    $ar1 = explode(" ",$s1clean);
    $ar2 = explode(" ",$s2clean);
    $l1 = count($ar1);
    $l2 = count($ar2);

    //flip the arrays if needed so ar1 is always largest.
    if ($l2>$l1) {
        $t = $ar2;
        $ar2 = $ar1;
        $ar1 = $t;
    }

    //flip array 2, to make the words the keys
    $ar2 = array_flip($ar2);


    $maxwords = max($l1, $l2);
    $matches = 0;

    //find matching words
    foreach($ar1 as $word) {
        if (array_key_exists($word, $ar2))
            $matches++;
    }

    return ($matches / $maxwords) * 100;    
}


$all_values = '';

$sql_object = "SELECT * FROM tag";
$result_object = mysql_query($sql_object);
while($row_object = mysql_fetch_array($result_object))
{
    $tag = $row_object['tag'];

    $sql_subject = "SELECT * FROM title ORDER BY added";
    $result_subject = mysql_query($sql_subject);
    while($row_subject = mysql_fetch_array($result_subject))
    {
        $title = $row_subject['title'];

        $all_values .= "Title($title) and Tag($tag) relevancy:". compareStrings($tag, $title) . "%"."<br/>";
    }
}
echo $all_values;

问题是:如何从$all_值中提取最相关的标记并插入到数据库中,因为这里我被卡住了。或者也许有更好的解决办法。如果有任何帮助,我将不胜感激。

在本例中,这很简单,因为一个标签有一定的相关性,而其他标签则没有相关性。所以,这并不是一个真正具有代表性的例子,是吗?现实将更加复杂。但关键是,我需要提取并插入到db标记中,其中包含最高百分比的related.justice。这一点在本例中被巧妙地忽略了。将内容放入数组,对其进行排序,以便获得具有最高值的标记,然后将其插入数据库中。。。仅供参考,嵌套到循环中的数据库查询对性能有害。这里根本没有理由首先嵌套查询。在一个查询中读取所有标记,在另一个查询中读取标题,并将两者放入数组中。之后循环这些数组,而不是反复执行相同的数据库查询。您必须定义如何确定相关性。简单地将出现次数除以单词数可能不够,因为大多数情况下,标签在标题中只出现一次。
function compareStrings($s1, $s2) {
    //one is empty, so no result
    if (strlen($s1)==0 || strlen($s2)==0) {
        return 0;
    }

    //replace none alphanumeric charactors
    //i left - in case its used to combine words
    $s1clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s1);
    $s2clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s2);

    //remove double spaces
    while (strpos($s1clean, "  ")!==false) {
        $s1clean = str_replace("  ", " ", $s1clean);
    }
    while (strpos($s2clean, "  ")!==false) {
        $s2clean = str_replace("  ", " ", $s2clean);
    }

    //create arrays
    $ar1 = explode(" ",$s1clean);
    $ar2 = explode(" ",$s2clean);
    $l1 = count($ar1);
    $l2 = count($ar2);

    //flip the arrays if needed so ar1 is always largest.
    if ($l2>$l1) {
        $t = $ar2;
        $ar2 = $ar1;
        $ar1 = $t;
    }

    //flip array 2, to make the words the keys
    $ar2 = array_flip($ar2);


    $maxwords = max($l1, $l2);
    $matches = 0;

    //find matching words
    foreach($ar1 as $word) {
        if (array_key_exists($word, $ar2))
            $matches++;
    }

    return ($matches / $maxwords) * 100;    
}


$all_values = '';

$sql_object = "SELECT * FROM tag";
$result_object = mysql_query($sql_object);
while($row_object = mysql_fetch_array($result_object))
{
    $tag = $row_object['tag'];

    $sql_subject = "SELECT * FROM title ORDER BY added";
    $result_subject = mysql_query($sql_subject);
    while($row_subject = mysql_fetch_array($result_subject))
    {
        $title = $row_subject['title'];

        $all_values .= "Title($title) and Tag($tag) relevancy:". compareStrings($tag, $title) . "%"."<br/>";
    }
}
echo $all_values;
Title(Audi is a great car) and Tag(Audi) relevancy:20%
Title(Audi is a great car) and Tag(BMW) relevancy:0%
Title(Audi is a great car) and Tag(Volkswagen) relevancy:0%
Title(Audi is a great car) and Tag(Mercedes Benz) relevancy:0%