Java Elasticsearch排除包含特定术语的文档
我在Java Elasticsearch排除包含特定术语的文档,java,python,elasticsearch,lucene,Java,Python,elasticsearch,Lucene,我在elasticsearch中索引了像bellow这样的文档 { "category": "clothing (f)", "description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt", "name": "Women's Unstoppable Graphic T-Shirt", "price": "$34.99" } 有像衣服(m),衣服(f)等类别。
elasticsearch
中索引了像bellow这样的文档
{
"category": "clothing (f)",
"description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
"name": "Women's Unstoppable Graphic T-Shirt",
"price": "$34.99"
}
有像衣服(m)
,衣服(f)
等类别。如果搜索的是女性物品,我试图排除凝结(m)
类别物品。我正在尝试的查询是:
{
"query": {
"bool": {
"must": [
{
"match": {
"description": "women's black shirt"
}
}
],
"must_not": [
{
"term": {
"category": "clothing (m)"
}
}
]
}
},
"from": 0,
"size": 50
}
但这并不像预期的那样有效。对于
服装(m)
文档和其他文档,结果总是很少。如何排除具有特定类别的文档?要排除特定的术语(精确匹配),您必须使用关键字
数据类型
关键字数据类型通常用于过滤(查找状态已发布的所有博客文章)、排序和聚合。关键字字段只能通过其精确值进行搜索
您当前的查询在结果中捕获了衣服(m),因为当您为文档编制索引时,使用elasticsearch标准分析器对其进行分析,该分析器将衣服(m)分析为衣服和(m)
在查询中,您搜索了类别
作为文本
数据类型
对文本数据类型字段进行分析,也就是说,在索引之前,这些字段通过分析器将字符串转换为单个术语的列表
运行以下命令:
POST my_index/_analyze
{
"text": ["clothing (m)"]
}
结果:
{
"tokens" : [
{
"token" : "clothing",
"start_offset" : 0,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "m",
"start_offset" : 10,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}
让我们发布一些文档:
POST my_index/_doc/1
{
"category": "clothing (m)",
"description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
"name": "Women's Unstoppable Graphic T-Shirt",
"price": "$34.99"
}
POST my_index/_doc/2
{
"category": "clothing (f)",
"description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
"name": "Women's Unstoppable Graphic T-Shirt",
"price": "$34.99"
}
现在我们的查询应该如下所示:
GET my_index/_search
{
"query": {
"bool": {
"must": {
"match": {
"description": "women's black shirt"
}
},
"filter": {
"bool": {
"must_not": {
"term": {
"category.keyword": "clothing (m)"
}
}
}
}
}
},
"from": 0,
"size": 50
}
结果是:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.43301374,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.43301374,
"_source" : {
"category" : "clothing (f)",
"description" : "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
"name" : "Women's Unstoppable Graphic T-Shirt",
"price" : "$34.99"
}
}
]
}
}
未使用关键字的结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.43301374,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.43301374,
"_source" : {
"category" : "clothing (f)",
"description" : "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
"name" : "Women's Unstoppable Graphic T-Shirt",
"price" : "$34.99"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.43301374,
"_source" : {
"category" : "clothing (m)",
"description" : "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
"name" : "Women's Unstoppable Graphic T-Shirt",
"price" : "$34.99"
}
}
]
}
}
正如您从上一次的结果中所看到的,我们还得到了衣服(m)。
顺便说一句,对于文本
数据类型,不要使用术语
。使用匹配
{
"category": "clothing (f)",
"description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
"name": "Women's Unstoppable Graphic T-Shirt",
"price": "$34.99"
}
希望这能有所帮助。不要在所有情况下都排除这些术语。你是什么意思?没有得到预期的结果你想举一个“没有得到预期结果”的例子吗?因为我的例子是有效的。是的,它对我也有效了一段时间。但是我得到了我排除的关键字的数据。因此,这肯定不是解决办法。