Php Preg匹配所有同义词文件_Php_Preg Match All

Php Preg匹配所有同义词文件

php

Php Preg匹配所有同义词文件,php,preg-match-all,Php,Preg Match All,我正在编写一个php脚本，它将通过一个文件（synonyms.dat）进行解析，并将同义词列表与其父单词进行协调，大约有150k个单词文件中的示例： 1|2 (adj)|one|i|ane|cardinal (noun)|one|I|ace|single|unity|digit|figure 1-dodecanol|1 (noun)|lauryl alcohol|alcohol 1-hitter|1 (noun)|one-hitter|baseball|baseball game|ball

我正在编写一个php脚本，它将通过一个文件（synonyms.dat）进行解析，并将同义词列表与其父单词进行协调，大约有150k个单词

文件中的示例：

1|2
(adj)|one|i|ane|cardinal 
(noun)|one|I|ace|single|unity|digit|figure
1-dodecanol|1
(noun)|lauryl alcohol|alcohol
1-hitter|1
(noun)|one-hitter|baseball|baseball game|ball
10|2
(adj)|ten|x|cardinal 
(noun)|ten|X|tenner|decade|large integer
100|2
(adj)|hundred|a hundred|one hundred|c|cardinal 
(noun)|hundred|C|century|one C|centred|large integer
1000|2
(adj)|thousand|a thousand|one thousand|m|k|cardinal 
(noun)|thousand|one thousand|M|K|chiliad|G|grand|thou|yard|large integer
**10000|1
(noun)|ten thousand|myriad|large**

在上面的例子中，我想把一万个，无数个，大的链接到单词1000

我尝试了各种方法，使用file\u get\u内容将.dat文件读入内存，然后在\n处分解文件，并使用各种数组搜索技术查找“父”字及其同义词。然而，这是非常缓慢的，而且通常不会使我的web服务器崩溃

我相信我需要做的是使用preg_match_all分解字符串，然后在字符串上迭代，在适当的地方插入到我的数据库中

$contents = file_get_contents($page);
preg_match_all("/([^\s]+)\|[0-9].*/",$contents,$out, PREG_SET_ORDER);

这两种颜色都相配

1|2

1-dodecanol|1

1-hitter|1

但我不知道如何在每次匹配之间链接字段（同义词本身）

此脚本打算运行一次，以便将所有信息适当地输入到我的数据库中。对于那些感兴趣的人，我有一个数据库“同义词索引”，它保存每个单词以及单词的唯一id。然后是另一个表“同义词列表”，其中包含一个“word\u id”列和一个“synomym\u id”列，其中每一列都是同义词索引的外键。每个单词\u id可以有多个同义词\u id

非常感谢你的帮助

哇，对于这种类型的功能，您有带有表和索引的数据库。 PHP服务于请求/响应，而不是将大文件读入内存。我建议你把数据存入数据库。这会快得多，而且它是专门为它设计的。

您可以使用它将每一行拆分为字段。（或者，根据输入的精确格式，可能是更好的选择。）

说明性示例，几乎肯定需要针对您的特定用例和数据格式进行调整：

$infile = fopen('synonyms.dat', 'r');
while (!feof($infile)) {
    $line = rtrim(fgets($infile), "\r\n");
    if ( $line === '' ) {
        continue;
    }

    // Line follows the format HEAD_WORD|NUMBER_OF_SYNONYM_LINES
    list($headWord, $n) = explode('|', $line);
    $synonyms = array();

    // For each synonym line...
    while ( $n-- ) {
        $line = rtrim(fgets($infile), "\r\n");
        $fields = explode('|', $line);
        $partOfSpeech = substr(array_shift($fields), 1, -1);
        $synonyms[$partOfSpeech] = $fields;
    }

    // Now here, when $headWord is '**10000', $synonyms should be array(
    //     'noun' => array('ten thousand', 'myriad', 'large**')
    // )
}

其目的是正确地运行此脚本一次，并将数据保存到我创建的数据库中，我将更新问题以反映我的数据库要求。抱歉，我认为这个问题没有必要。我接受了道歉。：-）PleaseStand提供了很好的解决方案。我建议使用file，它返回一个行数组。我觉得这总是很方便。太好了！这正是我想要的。与您提出的需要更改的建议相反，除了一些变量名之外，一切都正常工作。比我原来的解决方案好得多，并且非常感谢包含$partofSpeech变量。