Php 如何在内存中模糊搜索JSON而不是MySQL
Php 如何在内存中模糊搜索JSON而不是MySQL,php,mysql,Php,Mysql,项目行表,每个项目id最多有15k行: ID | source | target | project_id ---------------------- 0 | hello world | hola tierra | 2 1 | hello | hola | 2 2 | what |
项目行
表,每个项目id最多有15k行
:
ID | source | target | project_id
----------------------
0 | hello world | hola tierra | 2
1 | hello | hola | 2
2 | what | | 2
3 | you | tu | 2
4 | nevermind. I will head home. | | 2
5 | There are many sentences | | 2
| stored in the source column | | 2
| as well! | | 2
...
ID | source | target | glossary_id
----------------------
0 | hello world | hola tierra | 2
1 | hello word | hola palabra | 2
2 | what? | que? | 2
3 | your | tu | 2
4 | I will head home. | Me voy a la casa | 2
5 | There are many sentences | también hay muchas | 2
| stored in the source column | oraciones almacenadas | 2
| as well. | en la columna fuente | 2
...
glossary\u term
表,每个glossary\u id最多有40k行
:
ID | source | target | project_id
----------------------
0 | hello world | hola tierra | 2
1 | hello | hola | 2
2 | what | | 2
3 | you | tu | 2
4 | nevermind. I will head home. | | 2
5 | There are many sentences | | 2
| stored in the source column | | 2
| as well! | | 2
...
ID | source | target | glossary_id
----------------------
0 | hello world | hola tierra | 2
1 | hello word | hola palabra | 2
2 | what? | que? | 2
3 | your | tu | 2
4 | I will head home. | Me voy a la casa | 2
5 | There are many sentences | también hay muchas | 2
| stored in the source column | oraciones almacenadas | 2
| as well. | en la columna fuente | 2
...
我有一个“分析”功能,基本上是编译项目源列的摘要。我可以用5分钟来完成,因为用户每周只运行一次。但是,我当前的实现需要几个小时
SELECT * FROM project_row
SELECT * FROM glossary_row WHERE glossary_id = 2
$data = array();
foreach($projectRow as $row){ //Up to 15k results
//We need to find the best fuzzy match for this row's source in the glossary
//I do this by looping each glossary term and running similar_text
$highestMatchPercent = 0;
$rowSrc = $row["source"];
foreach($glossaryTerm as $term){ //Up to 40k results
$termSrc = $term["source"];
if(abs(strlen($rowSrc) - strlen($termSrc)) > 10) continue; //Added efficiency. If length difference more than 10, don't bother.
similar_text($rowSrc, $termSrc, $matchPercent);
if($matchPercent > $highestMatchPercent) $highestMatchPercent = $matchPercent;
if($highestMatchPercent == 100) break; //Added efficiency
}
//Now we have the highest match percent for this row. This is the data that I need for every row.
if($highestMatchPercent == 100){
$data["exact"]++;
} else if($highestMatchPercent > 75){
$data["fuzzy"]++;
}else{
$data["nothing"]++;
}
}
//Now I have the $data variable with a summary of what I need for this project...after many hours of running this process...
当然,完成15k行和40k词汇表术语需要很长时间。这个双环路很累人。我试图找出方法来限制每一行的MySQL术语结果,但是我没有找到一个好的解决方案
我被告知,模糊搜索加载在内存中的大量JSON文件要比从数据库中选择相同的数据并用PHP过滤快得多。但我不知道我将如何实现这样的东西。我的PHP可能需要与FlatDB进行通信,并从中查询结果
有什么想法吗