Php 需要使用preg_match_ALL匹配所有相似的单词/短语

Php 需要使用preg_match_ALL匹配所有相似的单词/短语,php,regex,web,preg-match,preg-match-all,Php,Regex,Web,Preg Match,Preg Match All,我试图创建一个模式,匹配字符串中所有类似的单词/短语 例如,我需要匹配:“this”、“this is”、“this is it”、“that”、“that was”、“that not” 它只匹配“this”的第一次出现,但应匹配所有出现 我甚至尝试了锚和单词边界,但似乎没有任何效果 我试过(简化): 应输出: 这个 这是 就是这个 那 那是 那不是 您也可以使用以下正则表达式 /(this(?:\sis(?:\sit)?)?)/i 那么: $content = "this is it";

我试图创建一个模式,匹配字符串中所有类似的单词/短语

例如,我需要匹配:“this”、“this is”、“this is it”、“that”、“that was”、“that not”

它只匹配“this”的第一次出现,但应匹配所有出现

我甚至尝试了锚和单词边界,但似乎没有任何效果

我试过(简化):

应输出:

  • 这个
  • 这是
  • 就是这个
  • 那是
  • 那不是

您也可以使用以下正则表达式

/(this(?:\sis(?:\sit)?)?)/i
那么:

$content = "this is it";
preg_match_all('/(?=(this))(?=(this is))(?=(this is it))/i', $content, $results);
print_r($results);
根据评论编辑:

$content = "this is it";
preg_match_all('/(?=(this))(?=(this is))(?=(this is it))|(?=(that))(?=(that was))(?=(that was not))/i', $content, $results);
print_r($results);
Array
(
    [0] => Array
        (
            [0] => 
            [1] => 
        )

    [1] => Array
        (
            [0] => this
            [1] => 
        )

    [2] => Array
        (
            [0] => this is
            [1] => 
        )

    [3] => Array
        (
            [0] => this is it
            [1] => 
        )

    [4] => Array
        (
            [0] => 
            [1] => that
        )

    [5] => Array
        (
            [0] => 
            [1] => that was
        )

    [6] => Array
        (
            [0] => 
            [1] => that was not
        )

)
Array
(
    [0] => Array
        (
            [0] => 
            [1] => 
        )

    [1] => Array
        (
            [0] => this
            [1] => that
        )

    [2] => Array
        (
            [0] => this is
            [1] => that was
        )

    [3] => Array
        (
            [0] => this is it
            [1] => that was not
        )

)
输出:

$content = "this is it";
preg_match_all('/(?=(this))(?=(this is))(?=(this is it))|(?=(that))(?=(that was))(?=(that was not))/i', $content, $results);
print_r($results);
Array
(
    [0] => Array
        (
            [0] => 
            [1] => 
        )

    [1] => Array
        (
            [0] => this
            [1] => 
        )

    [2] => Array
        (
            [0] => this is
            [1] => 
        )

    [3] => Array
        (
            [0] => this is it
            [1] => 
        )

    [4] => Array
        (
            [0] => 
            [1] => that
        )

    [5] => Array
        (
            [0] => 
            [1] => that was
        )

    [6] => Array
        (
            [0] => 
            [1] => that was not
        )

)
Array
(
    [0] => Array
        (
            [0] => 
            [1] => 
        )

    [1] => Array
        (
            [0] => this
            [1] => that
        )

    [2] => Array
        (
            [0] => this is
            [1] => that was
        )

    [3] => Array
        (
            [0] => this is it
            [1] => that was not
        )

)
更具普遍性:

$content = "this is it! that was not!";
preg_match_all('/\b(?=(\w+))(?=(\w+ \w+))(?=(\w+ \w+ \w+))\b/i', $content, $results);
print_r($results);
输出:

$content = "this is it";
preg_match_all('/(?=(this))(?=(this is))(?=(this is it))|(?=(that))(?=(that was))(?=(that was not))/i', $content, $results);
print_r($results);
Array
(
    [0] => Array
        (
            [0] => 
            [1] => 
        )

    [1] => Array
        (
            [0] => this
            [1] => 
        )

    [2] => Array
        (
            [0] => this is
            [1] => 
        )

    [3] => Array
        (
            [0] => this is it
            [1] => 
        )

    [4] => Array
        (
            [0] => 
            [1] => that
        )

    [5] => Array
        (
            [0] => 
            [1] => that was
        )

    [6] => Array
        (
            [0] => 
            [1] => that was not
        )

)
Array
(
    [0] => Array
        (
            [0] => 
            [1] => 
        )

    [1] => Array
        (
            [0] => this
            [1] => that
        )

    [2] => Array
        (
            [0] => this is
            [1] => that was
        )

    [3] => Array
        (
            [0] => this is it
            [1] => that was not
        )

)
问题是,最短字符串选项首先出现在您的或组中:

PHP将从左到右检查测试字符串是否包含
(this | this | this it)
。一旦在测试字符串中找到匹配项,它将离开组

这将起作用,因为PHP将首先搜索最长的字符串:

/(this is it|this is|this)/i


考虑到您只捕获正在搜索的术语,最好只使用
foreach
循环以及
substr\u count
查看每个字符串出现的次数

例如:

$haystack=“就是这样!那不是!这不是测试!”;
$TINERS=阵列(
“这个”,
“这是”,
“就是这样”,
“那”,
“那是”,
“那不是”);
foreach($针作为$针){
//substr_计数区分大小写,因此将主题和搜索设置为小写
$hits=子线程计数(strtolower($haystack),strtolower($pinder));
echo“搜索“$pinder”出现$hits time(s)”.PHP\u EOL;
}
以上将输出:

搜索“此”出现2次
搜索“这是”发生2次
搜索“就是这样”出现1次
搜索出现1次的“that”
搜索“曾经”出现1次
搜索“不是”出现1次

如果
substr\u count
不能提供您所需的灵活性,那么您可以始终将其替换为
preg\u match\u all
,并使用您个人的
$needle
值作为搜索条件。

使用
/(这是(它)?)/
问题:如果您只想捕获您特别搜索的子字符串,为什么不满足于一个
foreach
循环和
substr\u count
?@Llama先生,你介意发布一个例子吗?@Tom-作为答案发布:但这根本不能回答问题。谢谢,这是有意义的,但它仍然没有显示数组中所有可能的结果。有什么想法吗?啊,我想我明白为什么它在数组中只显示“就是这样”。我使用的测试字符串只包含“this is it”,而您的测试字符串包含每个单词/短语。有没有办法只使用“this Is it”来获取所有匹配项?谢谢,这很有效,但是有没有办法在模式中添加更多不以“this”开头的单词?例如,它还匹配一些不同的东西(除了上面提到的):“红”、“红苹果”、“红苹果树”?@汤姆:对不起,我不明白你的意思。您可能需要编辑您的问题,并给出一些示例字符串和预期结果。否,它需要与这些确切的单词/短语匹配。您的第一个示例非常有效,我只需要为模式添加更多可能的匹配项。谢谢!太棒了!唯一的问题是,如果字符串只包含部分单词/短语,那么它与之不匹配。例如,如果字符串仅包含“that was”,则它与任何内容都不匹配。有什么想法吗?谢谢你举个例子。那么,如果我想将所有匹配项放入一个数组中,我该怎么做呢?对不起,我不是最好的PHP程序员。匹配本身不会有很大影响,因为您已经知道它们是什么(因为您实际上只是在搜索它们)。但是,如果您想存储它们,可以在echo语句之后执行类似于($i=0;$i<$hits;$i++){$matches[]=$needle;}的操作。谢谢,这很有效。但是,如何从数组中删除重复的匹配项?那么,如果“this is”匹配两次,它只在数组中存储一次?数组_的唯一函数会工作吗?或者,还有更好的吗?@Tom-只要删除
for
循环,并在($hits>0){$matches[]=$needle;}