<img src="//i.stack.imgur.com/RUiNP.png" height="16" width="18" alt="" class="sponsor tag img">elasticsearch 了解Elasticsearch中的分析器、过滤器和查询_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch

elasticsearch 了解Elasticsearch中的分析器、过滤器和查询

elasticsearch 了解Elasticsearch中的分析器、过滤器和查询,elasticsearch,elasticsearch,当我应该使用分析器、过滤器和查询时，我正在努力让自己的头脑清醒起来。我已经阅读了elastic.co网站上的深入搜索文章，并有了更好的理解，但这些示例对于我的用例来说是幼稚的，仍然有点令人困惑鉴于我的文档包含一系列成分，包含任何混合的消化饼干，饼干，奶酪，和巧克力，我试图找出分析数据的最佳方法，并对其进行搜索以下是一组简单的文档： [{ "ingredients": ["cheese", "chocolate"] }, { "ingredients": ["chocolate

当我应该使用分析器、过滤器和查询时，我正在努力让自己的头脑清醒起来。我已经阅读了elastic.co网站上的深入搜索文章，并有了更好的理解，但这些示例对于我的用例来说是幼稚的，仍然有点令人困惑

鉴于我的文档包含一系列成分，包含任何混合的

消化饼干

，

饼干

，

奶酪

，和

巧克力

，我试图找出分析数据的最佳方法，并对其进行搜索

以下是一组简单的文档：

[{
    "ingredients": ["cheese", "chocolate"]
}, {
    "ingredients": ["chocolate", "biscuits"]
}, {
    "ingredients": ["cheese", "biscuits"]
}, {
    "ingredients": ["chocolate", "digestive biscuits"]
}, {
    "ingredients": ["cheese", "digestive biscuits"]
}, {
    "ingredients": ["cheese", "chocolate", "biscuits"]
}, {
    "ingredients": ["cheese", "chocolate", "digestive biscuits"]
}]

（我故意没有把

饼干

和

消化饼干

混在一起，我会在这里解释一下。）

我有一个搜索字段，可以让人们自由输入他们选择的任何成分，我现在把它拆分成空白，给我一系列的术语来使用

我有这样的映射：

{
    "properties": {
        "ingredients": {
            "type": "string",
            "analyzer": "keyword"
        }
    }
}

我在这里面临的问题是

饼干

与

消化饼干

不匹配，

饼干

与任何东西都不匹配

我知道我必须用

snowball

分析器来分析场，但我不确定该怎么做

我需要多领域的方法吗？我是否也需要使用过滤器进行查询？我希望看到的结果是：

```
饼干
```
匹配
```
饼干
```
和
```
消化饼干
```
（后者得分较低）
```
饼干
```
匹配
```
饼干
```
和
```
消化饼干
```
（后者得分较低）
```
消化
```
搭配
```
消化饼干
```
```
消化饼干
```
匹配自身和
```
饼干
```
（后者得分较低）

另外，随机加入任何其他术语，我该如何处理？使用过滤器还是查询

我对如何通过映射和搜索从索引中构造这一权利感到非常困惑，因此，如果有人有任何示例建议，我将不胜感激。

首先，我建议阅读以下内容：

它讨论的正是你们面临的问题

所以要解决这个问题，您必须使用自定义分析器（它是使用字符过滤器、标记器和过滤器构建的）。Analyzer从文本blob发出令牌

因此，在您的具体案例中，我将向您展示如何创建一个简单的自定义分析器，以实现您想要的：

PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer_custom": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "asciifolding",
            "lowercase",
            "kstem"
          ]
        }
      }
    }
  },
  "mappings": {
    "data": {
      "properties": {
        "ingredients": {
          "type": "string",
          "analyzer": "my_analyzer_custom"
        }
      }
    }
  }
}

此分析器将使用标准标记器拆分文本，并应用以下筛选器：

```
ascifolding
```
-使用重音字符（é=>e）规范化字母
```
小写
```
-小写标记，因此搜索不区分大小写
```
kstem
```
-过滤器，将令牌规范化为其根形式（不理想，但效果很好）。在这种情况下，它将把饼干标准化为饼干

下面是您的示例数据：

PUT /test/data/1
{
  "ingredients": ["cheese", "chocolate"]
}
PUT /test/data/2
{
  "ingredients": ["chocolate", "biscuits"]
}
PUT /test/data/3
{
  "ingredients": ["cheese", "biscuits"]
}
PUT /test/data/4
{
  "ingredients": ["chocolate", "digestive biscuits"]
}
PUT /test/data/5
{
  "ingredients": ["cheese", "digestive biscuits"]
}
PUT /test/data/6
{
  "ingredients": ["cheese", "chocolate", "biscuits"]
}
PUT /test/data/7
{
  "ingredients": ["cheese", "chocolate", "digestive biscuits"]
}

此查询：

GET /test/_search
{
  "query": {
    "dis_max": {
      "tie_breaker": 0.7,
      "boost": 1.5,
      "queries": [
        {
          "match": {
            "ingredients": {
              "query": "digestive biscuits",
              "type": "phrase",
              "boost": 5
            }
          }
        },
        {
          "match": {
            "ingredients": {
              "query": "digestive biscuits",
              "operator": "and",
              "boost": 3
            }
          }
        },
        {
          "match": {
            "ingredients": {
              "query": "digestive biscuits"
            }
          }
        }
      ]
    }
  }
}

我在这个案子里用过。你看到有一系列的查询了吗？我们在那个里指定了多个查询，它将返回得分最高的文档。从文件：

一种查询，用于生成由其子查询，并为每个文档的由任何子查询生成的文档，加上一个中断连接任何其他匹配子查询的增量

在本例中，我指定了三个查询：

。查询应在条款和职位上匹配
与
```
“operator”匹配“：”和“
```
”，这意味着所有术语必须匹配，无论其顺序如何
一个简单的匹配查询。这意味着任何令牌都必须匹配

你可以看到，对于每一个，我都指定了不同的提升值——这就是你如何优先考虑它们的重要性

我希望这能有所帮助。