Php 如何在内存中模糊搜索JSON而不是MySQL_Php_Mysql

Php 如何在内存中模糊搜索JSON而不是MySQL

php mysql

Php 如何在内存中模糊搜索JSON而不是MySQL,php,mysql,Php,Mysql,项目行表，每个项目id最多有15k行： ID | source | target | project_id ---------------------- 0 | hello world | hola tierra | 2 1 | hello | hola | 2 2 | what |

项目行

表，每个

项目id最多有15k行

：

ID | source                       |  target     | project_id
----------------------
0  | hello world                  | hola tierra | 2
1  | hello                        | hola        | 2
2  | what                         |             | 2
3  | you                          | tu          | 2
4  | nevermind. I will head home. |             | 2
5  | There are many sentences     |             | 2
   | stored in the source column  |             | 2
   | as well!                     |             | 2
...

ID | source                       |  target               | glossary_id
----------------------
0  | hello world                  | hola tierra           | 2
1  | hello word                   | hola palabra          | 2
2  | what?                        | que?                  | 2
3  | your                         | tu                    | 2
4  | I will head home.            | Me voy a la casa      | 2
5  | There are many sentences     | también hay muchas    | 2
   | stored in the source column  | oraciones almacenadas | 2
   | as well.                     | en la columna fuente  | 2
...

glossary\u term

表，每个

glossary\u id最多有40k行

：

ID | source                       |  target     | project_id
----------------------
0  | hello world                  | hola tierra | 2
1  | hello                        | hola        | 2
2  | what                         |             | 2
3  | you                          | tu          | 2
4  | nevermind. I will head home. |             | 2
5  | There are many sentences     |             | 2
   | stored in the source column  |             | 2
   | as well!                     |             | 2
...

ID | source                       |  target               | glossary_id
----------------------
0  | hello world                  | hola tierra           | 2
1  | hello word                   | hola palabra          | 2
2  | what?                        | que?                  | 2
3  | your                         | tu                    | 2
4  | I will head home.            | Me voy a la casa      | 2
5  | There are many sentences     | también hay muchas    | 2
   | stored in the source column  | oraciones almacenadas | 2
   | as well.                     | en la columna fuente  | 2
...

我有一个“分析”功能，基本上是编译项目源列的摘要。我可以用5分钟来完成，因为用户每周只运行一次。但是，我当前的实现需要几个小时

SELECT * FROM project_row
SELECT * FROM glossary_row WHERE glossary_id = 2

$data = array();

foreach($projectRow as $row){ //Up to 15k results
    //We need to find the best fuzzy match for this row's source in the glossary
    //I do this by looping each glossary term and running similar_text

    $highestMatchPercent = 0;

    $rowSrc = $row["source"];

    foreach($glossaryTerm as $term){ //Up to 40k results
        $termSrc = $term["source"];

        if(abs(strlen($rowSrc) - strlen($termSrc)) > 10) continue; //Added efficiency. If length difference more than 10, don't bother.

        similar_text($rowSrc, $termSrc, $matchPercent);
        if($matchPercent > $highestMatchPercent) $highestMatchPercent = $matchPercent;

        if($highestMatchPercent == 100) break; //Added efficiency
    }

    //Now we have the highest match percent for this row. This is the data that I need for every row.

    if($highestMatchPercent == 100){
        $data["exact"]++;
    } else if($highestMatchPercent > 75){
        $data["fuzzy"]++;
    }else{
        $data["nothing"]++;
    }
    
}

//Now I have the $data variable with a summary of what I need for this project...after many hours of running this process...

当然，完成15k行和40k词汇表术语需要很长时间。这个双环路很累人。我试图找出方法来限制每一行的MySQL术语结果，但是我没有找到一个好的解决方案

我被告知，模糊搜索加载在内存中的大量JSON文件要比从数据库中选择相同的数据并用PHP过滤快得多。但我不知道我将如何实现这样的东西。我的PHP可能需要与FlatDB进行通信，并从中查询结果

有什么想法吗