elasticsearch 在Elasticsearch中,如果这些单词被分开,如何通过在数据中写在一起的单词进行搜索?
我有一些文档,比如说,这个文档有一个字段名。名称可能由几个分开书写的单词组成,例如:
elasticsearch 在Elasticsearch中,如果这些单词被分开,如何通过在数据中写在一起的单词进行搜索?,
elasticsearch,search,
elasticsearch,Search,我有一些文档,比如说,这个文档有一个字段名。名称可能由几个分开书写的单词组成,例如: { "name": "first document" }, { "name": "second document" } 我的目标是能够通过字符串搜索这些文档: firstdocument, seconddocumen 正如您所看到的,搜索字符串写错了,但如果我们从文档名称中删除空格,它们仍然与这些文档匹配。这个问
{
"name": "first document"
},
{
"name": "second document"
}
我的目标是能够通过字符串搜索这些文档:
firstdocument, seconddocumen
正如您所看到的,搜索字符串写错了,但如果我们从文档名称中删除空格,它们仍然与这些文档匹配。这个问题可以通过创建另一个具有相同字符串但没有空格的字段来解决,但它看起来像是额外的数据,除非没有其他方法可以做到这一点
我需要类似的东西:
GET /_analyze
{
"tokenizer": "whitespace",
"filter": [
{
"type":"shingle",
"max_shingle_size":3,
"min_shingle_size":2,
"output_unigrams":"true",
"token_separator": ""
}
],
"text": "first document"
}
但反过来说。我需要的不是将其应用于搜索文本,而是应用于搜索对象(文档名称),这样我就可以在搜索文本中找到拼写有点错误的文档。应该怎么做?我建议使用分析器删除空白
分析仪
"no_spaces": {
"filter": [
"lowercase"
],
"char_filter": [
"remove_spaces"
],
"tokenizer": "standard"
}
"remove_spaces": {
"type": "pattern_replace",
"pattern": "[ ]",
"replacement": ""
}
字符过滤器
"no_spaces": {
"filter": [
"lowercase"
],
"char_filter": [
"remove_spaces"
],
"tokenizer": "standard"
}
"remove_spaces": {
"type": "pattern_replace",
"pattern": "[ ]",
"replacement": ""
}
字段映射
"name": {
"type": "text",
"fields": {
"without_spaces": {
"type": "text",
"analyzer": "no_spaces"
}
}
}
查询
GET /_search
{
"query": {
"match": {
"name.without_spaces": {
"query": "seconddocumen",
"fuzziness": "AUTO"
}
}
}
}
编辑: 完成时:除了
remove_spaces
过滤器外,还可以使用木瓦过滤器:
"analysis": {
"filter": {
"shingle_filter": {
"type": "shingle",
"output_unigrams": "false",
"token_separator": ""
}
},
"analyzer": {
"shingle_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"shingle_filter"
]
}
}
}
我建议与分析器一起使用以删除空白
分析仪
"no_spaces": {
"filter": [
"lowercase"
],
"char_filter": [
"remove_spaces"
],
"tokenizer": "standard"
}
"remove_spaces": {
"type": "pattern_replace",
"pattern": "[ ]",
"replacement": ""
}
字符过滤器
"no_spaces": {
"filter": [
"lowercase"
],
"char_filter": [
"remove_spaces"
],
"tokenizer": "standard"
}
"remove_spaces": {
"type": "pattern_replace",
"pattern": "[ ]",
"replacement": ""
}
字段映射
"name": {
"type": "text",
"fields": {
"without_spaces": {
"type": "text",
"analyzer": "no_spaces"
}
}
}
查询
GET /_search
{
"query": {
"match": {
"name.without_spaces": {
"query": "seconddocumen",
"fuzziness": "AUTO"
}
}
}
}
编辑: 完成时:除了
remove_spaces
过滤器外,还可以使用木瓦过滤器:
"analysis": {
"filter": {
"shingle_filter": {
"type": "shingle",
"output_unigrams": "false",
"token_separator": ""
}
},
"analyzer": {
"shingle_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"shingle_filter"
]
}
}
}