从PHP中的searchstring中删除停止词_Php

从PHP中的searchstring中删除停止词

php

从PHP中的searchstring中删除停止词,php,Php,我在使用php函数优化mssql查询的搜索字符串时遇到了问题我需要通过搜索“霍比特人”来找到一个看起来像“霍比特人”的条目。我考虑过如果搜索字符串后面有一个空格，就删掉这些文章（在德国我们有'der'、'die'和'das'）我的函数如下所示： public function optimizeSearchString($searchString) { $articles = [ 'der ', 'die ',

我在使用php函数优化mssql查询的搜索字符串时遇到了问题

我需要通过搜索“霍比特人”来找到一个看起来像“霍比特人”的条目。我考虑过如果搜索字符串后面有一个空格，就删掉这些文章（在德国我们有'der'、'die'和'das'）

我的函数如下所示：

      public function optimizeSearchString($searchString)
      {
        $articles = [
          'der ',
          'die ',
          'das ',
          'the '
        ];


        foreach ($articles as $article) {
//only cut $article out of $searchString if its longer than the $article itself
          if (strlen($searchString) > strlen($article) && strpos($searchString, $article)) {
            $searchString = str_replace($article, '', $searchString);
            break;
          }
        }

        return $searchString;
      }

但这不起作用

使用正则表达式可能有更好的解决方案？

1。）只需使用以下方法从字符串的开头或结尾删除一个停止字：

```
~
```
是
```
^
```
插入符号与字符串开头匹配
```
\W
```
（大写）表示字符，而不是

（在第一个括号中的| die | das | the）

替换

```
\b
```
与
在
```
（？1）
```
粘贴第一组的图案
```
$
```
匹配字符串中最后一个字符的后面
已使用
```
i
```
（PCRE\u无壳）。如果输入为utf-8，还需要
```
u
```
（PCRE\u UTF8）标志

生成模式：

// array containing stopwords
$stopwords = array("der", "die", "das", "the");

// escape the stopword array and implode with pipe
$s = '~^\W*('.implode("|", array_map("preg_quote", $stopwords)).')\W+\b|\b\W+(?1)\W*$~i';

// replace with emptystring
$searchString = preg_replace($s, "", $searchString);

请注意，如果

分隔符出现在

$stopwords

数组中，则还必须使用反斜杠对其进行转义

2.）但要删除字符串中任何位置的停止字拆分为多个字如何：

// words to be removed
$stopwords = array(
'der' => 1,
'die' => 1,
'das' => 1,
'the' => 1);
# used words as key for better performance

// remove stopwords from string
function strip_stopwords($str = "")
{
  global $stopwords;

  // 1.) break string into words
  // [^-\w\'] matches characters, that are not [0-9a-zA-Z_-']
  // if input is unicode/utf-8, the u flag is needed: /pattern/u
  $words = preg_split('/[^-\w\']+/', $str, -1, PREG_SPLIT_NO_EMPTY);

  // 2.) if we have at least 2 words, remove stopwords
  if(count($words) > 1)
  {
    $words = array_filter($words, function ($w) use (&$stopwords) {
      return !isset($stopwords[strtolower($w)]);
      # if utf-8: mb_strtolower($w, "utf-8")
    });
  }

  // check if not too much was removed such as "the the" would return empty
  if(!empty($words))
    return implode(" ", $words);
  return $str;
}

看,

霍比特人

此解决方案还将删除除

之外的任何标点符号，因为在删除常用词后，它会用空格内插剩余的词。其思想是为查询准备字符串

这两种解决方案都不会修改大小写，如果字符串只包含一个stopword，则会保留该字符串

常用词列表

维基百科
维基百科

  public function optimizeSearchString($searchString = "")
  {
    $stopwords = array(
      'der' => 1,
      'die' => 1,
      'das' => 1,
      'the' => 1);

    $words = preg_split('/[^-\w\']+/', $searchString, -1, PREG_SPLIT_NO_EMPTY);

    if (count($words) > 1) {
      $words = array_filter($words, function ($v) use (&$stopwords) {
        return !isset($stopwords[strtolower($v)]);
      }
      );
    }

    if (empty($words)) {
      return $searchString;
    }

    return implode(" ", $words);
  }

public function optimizeSearchString($searchString) {
    $wordsFromSearchString = str_word_count($searchString, true);
    $finalWords = array_diff($wordsFromSearchString, $stopwords);
    return implode(" ", $finalWords);
}

array_diff

//Search string with article
$searchString = "Das blaue Haus"; //"The blue house"

//Split string into array. (This method is insufficient and doesn't account for compound nouns like "blue jay" or "einfamilienhaus".)
$wordArray = preg_split('/[^-\w\']+/', $searchString, -1, PREG_SPLIT_NO_EMPTY); 

var_dump(optimizeSearchString($wordArray));

function optimizeSearchString($wordArray) {
  $articles = array('der', 'die', 'das', 'the');
  $newArray = array_udiff($wordArray, $articles, 'strcasecmp');
  return $newArray;
}

array(2) {
  [1]=>
  string(5) "blaue"
  [2]=>
  string(4) "Haus"
}

strlen（$searchString）>strlen（$article）

strpos

strpos（…）！==错误

preg\u replace

array\u filter

$stopwords

//Search string with article
$searchString = "Das blaue Haus"; //"The blue house"

//Split string into array. (This method is insufficient and doesn't account for compound nouns like "blue jay" or "einfamilienhaus".)
$wordArray = preg_split('/[^-\w\']+/', $searchString, -1, PREG_SPLIT_NO_EMPTY); 

var_dump(optimizeSearchString($wordArray));

function optimizeSearchString($wordArray) {
  $articles = array('der', 'die', 'das', 'the');
  $newArray = array_udiff($wordArray, $articles, 'strcasecmp');
  return $newArray;
}

array(2) {
  [1]=>
  string(5) "blaue"
  [2]=>
  string(4) "Haus"
}

public function optimizeSearchString($searchString)
{
        $articles = (
          'der ',
          'die ',
          'das ',
          'the '
        );


        foreach ($articles as $article) {
         //only cut $article out of $searchString if its longer than the $article itself
          if (strlen($searchString) > strlen($article) && strpos($searchString, $article)) {
            $searchString = str_replace($article, '', $searchString);
            break;
          }
        }

        return $searchString;
}