elasticsearch Elasticsearch筛选器仅显示具有列表中的值的文档
使用以下脚本,我设置elasticsearch为文章编制索引。每篇文章都可以有作者列表:elasticsearch Elasticsearch筛选器仅显示具有列表中的值的文档,elasticsearch,elasticsearch,使用以下脚本,我设置elasticsearch为文章编制索引。每篇文章都可以有作者列表: #!/usr/bin/env bash HOST='http://localhost:9200/articles' curl -XPUT ${HOST} curl -XPUT "${HOST}/article/_mapping" -d '{ "article" : { "properties" : { "authors" : {"type":"strin
#!/usr/bin/env bash
HOST='http://localhost:9200/articles'
curl -XPUT ${HOST}
curl -XPUT "${HOST}/article/_mapping" -d '{
"article" : {
"properties" : {
"authors" : {"type":"string", "index_name":"author", "index": "not_analyzed"}
}
}
}'
curl -XPOST "${HOST}/article/1" -d '{
"authors" : ["Albert","Wolfgang","Richard","Murray"],
"message" : "Blabla" }'
curl -XPOST "${HOST}/article/2" -d '{
"authors" : ["Albert","Richard"],
"message" : "Blublu" }'
curl -XPOST "${HOST}/article/3" -d '{
"authors" : ["Albert"],
"message" : "Bleble" }'
我想做的是过滤掉所有作者不在给定列表中的文章。我尝试了以下查询:
curl -XGET "${HOST}/_search?pretty=true" -d '{
"query": {
"constant_score": {
"filter": {
"terms": {
"authors": ["Albert","Richard","Erwin"],
"execution": "or"
}
}
}
}
}'
然而,这将返回所有三篇文章作为点击。但我确实想过滤掉第1条,因为它有一些作者
[“Wolfgang”,“Murray”]
,它们不在给定作者的列表中[“Albert”,“Richard”,“Erwin”]
。通过elasticsearch可以实现这一点吗?这是个棘手的问题。如果我理解得很好,您需要查找作者字段也只包含这三个值中的一个(或多个)的文档
根据elasticsearch权威指南,检查字段是否只包含某些值是相当昂贵的
受的启发,有一个丑陋的解决方案,它将使用bool过滤器的组合和添加author\u count字段
您必须检查是否:
- 作者包含Albert/Richard/Erwin中的一个,作者计数等于1
- 作者包含两位作者(Albert&Richard、Erwin&Albert等)的可能组合之一,作者数为2
- 作者包含3位作者,作者数为3
这对我来说是不可行的。正如Tom83所建议的,在链接的elasticsearch权威指南中,这可以通过在索引时添加
作者计数
属性来实现。像这样:
curl -XPOST "${HOST}/article/1" -d '{
"authors": [
"Albert",
"Wolfgang",
"Richard",
"Murray"
],
"authors_counts": 4,
"message": "Blabla"
}'
curl -XPOST "${HOST}/article/2" -d '{
"authors": [
"Albert",
"Richard"
],
"authors_counts": 2,
"message": "Blublu"
}'
curl -XPOST "${HOST}/article/3" -d '{
"authors": [
"Albert"
],
"authors_counts": 1,
"message": "Bleble"
}'
然后,您可以按照指南中的建议创建查询。它非常冗长,所以我决定生成它
(ns query-gen
(:require [clojure.data.json :as json]
[clojure.math.combinatorics :refer [subsets]]))
(defn gen-filter [items]
(let [terms (map (fn [term] { "term" { "authors" term } }) items)
terms_count { "term" { "authors_counts" (count items) }}]
{ "bool" { "must" (cons terms_count terms)}}))
(defn gen-query [names]
(let [subsets (rest (subsets names))
filters (map gen-filter subsets)]
{"query" { "filtered" { "filter" { "or" filters }}}}))
(defn -main [& args]
(let [ query (gen-query ["Albert" "Richard" "Erwin"])
json (json/write-str query)]
(println json)))
将生成如下所示的查询:
{
"query": {
"filtered": {
"filter": {
"or": [
{
"bool": {
"must": [
{
"term": {
"authors_counts": 1
}
},
{
"term": {
"authors": "Albert"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"authors_counts": 1
}
},
{
"term": {
"authors": "Richard"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"authors_counts": 1
}
},
{
"term": {
"authors": "Erwin"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"authors_counts": 2
}
},
{
"term": {
"authors": "Albert"
}
},
{
"term": {
"authors": "Richard"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"authors_counts": 2
}
},
{
"term": {
"authors": "Albert"
}
},
{
"term": {
"authors": "Erwin"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"authors_counts": 2
}
},
{
"term": {
"authors": "Richard"
}
},
{
"term": {
"authors": "Erwin"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"authors_counts": 3
}
},
{
"term": {
"authors": "Albert"
}
},
{
"term": {
"authors": "Richard"
}
},
{
"term": {
"authors": "Erwin"
}
}
]
}
}
]
}
}
}
}
如果像这样使用:
curl -XGET "${HOST}/_search?pretty=true" -d '{
"query": {
"filtered": {
"filter": {
"or": [
{
"bool": {
"must": [
{
"term": {
"authors_counts": 1
}
},
{
"term": {
"authors": "Albert"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"authors_counts": 1
}
},
{
"term": {
"authors": "Richard"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"authors_counts": 1
}
},
{
"term": {
"authors": "Erwin"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"authors_counts": 2
}
},
{
"term": {
"authors": "Albert"
}
},
{
"term": {
"authors": "Richard"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"authors_counts": 2
}
},
{
"term": {
"authors": "Albert"
}
},
{
"term": {
"authors": "Erwin"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"authors_counts": 2
}
},
{
"term": {
"authors": "Richard"
}
},
{
"term": {
"authors": "Erwin"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"authors_counts": 3
}
},
{
"term": {
"authors": "Albert"
}
},
{
"term": {
"authors": "Richard"
}
},
{
"term": {
"authors": "Erwin"
}
}
]
}
}
]
}
}
}
}'
它返回我认为您期望的结果:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.0,
"hits": [
{
"_index": "articles",
"_type": "article",
"_id": "3",
"_score": 1.0,
"_source": {
"authors": [
"Albert"
],
"authors_counts": 1,
"message": "Bleble"
}
},
{
"_index": "articles",
"_type": "article",
"_id": "2",
"_score": 1.0,
"_source": {
"authors": [
"Albert",
"Richard"
],
"authors_counts": 2,
"message": "Blublu"
}
}
]
}
}
我不确定这是不是一个好主意,但很有趣。希望有帮助