elasticsearch 复杂弹性搜索查询,elasticsearch,kibana,elasticsearch,Kibana" /> elasticsearch 复杂弹性搜索查询,elasticsearch,kibana,elasticsearch,Kibana" />

elasticsearch 复杂弹性搜索查询

elasticsearch 复杂弹性搜索查询,elasticsearch,kibana,elasticsearch,Kibana,我在弹性搜索索引中有以下文档 [{ "_index": "ten2", "_type": "documents", "_id": "c323c2244a4a4c22_en-us", "_source": { "publish_details": [{

我在弹性搜索索引中有以下文档

[{
        "_index": "ten2",
        "_type": "documents",
        "_id": "c323c2244a4a4c22_en-us",
        "_source": {
            "publish_details": [{
                    "environment": "603fe91adbdcff66",
                    "time": "2020-06-24T13:36:55.514Z",
                    "locale": "hi-in",
                    "user": "aadab2f531206e9d",
                    "version": 1
                },
                {
                    "environment": "603fe91adbdcff66",
                    "time": "2020-06-24T13:36:55.514Z",
                    "locale": "en-us",
                    "user": "aadab2f531206e9d",
                    "version": 1
                }
            ],
            "created_at": "2020-06-24T13:36:43.037Z",
            "_in_progress": false,
            "title": "Entry 1",
            "locale": "en-us",
            "url": "/entry-1",
            "tags": [],
            "uid": "c323c2244a4a4c22",
            "updated_at": "2020-06-24T13:36:43.037Z",
            "fields": []
        }
    },
    {
        "_index": "ten2",
        "_type": "documents",
        "_id": "c323c2244a4a4c22_mr-in",
        "_source": {
            "publish_details": [{
                "environment": "603fe91adbdcff66",
                "time": "2020-06-24T13:37:26.205Z",
                "locale": "mr-in",
                "user": "aadab2f531206e9d",
                "version": 1
            }],
            "created_at": "2020-06-24T13:36:43.037Z",
            "_in_progress": false,
            "title": "Entry 1 marathi",
            "locale": "mr-in",
            "url": "/entry-1",
            "tags": [],
            "uid": "c323c2244a4a4c22",
            "updated_at": "2020-06-24T13:37:20.092Z",
            "fields": []
        }
    }
]
我希望这个结果为空。在这里,我们可以看到两个文档的uid是相同的。我正在使用以下查询获取结果:

{
    "query": {
        "bool": {
            "must": [{
                "bool": {
                    "must_not": [{
                        "bool": {
                            "must": [{
                                "nested": {
                                    "path": "publish_details",
                                    "query": {
                                        "term": {
                                            "publish_details.environment": "603fe91adbdcff66"
                                        }
                                    }
                                }
                            }, {
                                "nested": {
                                    "path": "publish_details",
                                    "query": {
                                        "term": {
                                            "publish_details.locale": "en-us"
                                        }
                                    }
                                }
                            }, {
                                "nested": {
                                    "path": "publish_details",
                                    "query": {
                                        "term": {
                                            "publish_details.locale": "hi-in"
                                        }
                                    }
                                }
                            }, {
                                "nested": {
                                    "path": "publish_details",
                                    "query": {
                                        "term": {
                                            "publish_details.locale": "mr-in"
                                        }
                                    }
                                }
                            }]
                        }
                    }]
                }
            }]
        }
    }
}
但是上面的查询给了我所有2个文档,但是我想要结果作为银行。这里的原因是uid是通用的,并且uid包含所有三个本地的发布细节。所以,获得有效结果的方法,就是在这里帮助我的任何聚合查询。这只是一个样本,我有很多文档要过滤掉。Kindle在这里帮助我

{
  "aggs": {
    "agg1": {
      "terms": {
        "field": "uid.raw"
      },
      "aggs": {
        "agg2": {
          "nested": {
            "path": "publish_details"
          },
          "aggs": {
            "locales": {
              "terms": {
                "field": "publish_details.locale"
              }
            }
          }
        }
      }
    }
  }
}
此查询将首先按uid对您进行分组,然后发布\u details.locale

它提供了如下结果

"aggregations": {
        "agg1": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "c323c2244a4a4c22",
                    "doc_count": 2,
                    "agg2": {
                        "doc_count": 3,
                        "locales": {
                            "doc_count_error_upper_bound": 0,
                            "sum_other_doc_count": 0,
                            "buckets": [
                                {
                                    "key": "en-us",
                                    "doc_count": 1
                                },
                                {
                                    "key": "hi-in",
                                    "doc_count": 1
                                },
                                {
                                    "key": "mr-in",
                                    "doc_count": 1
                                }
                            ]
                        }
                    }
                },
                {
                    "key": "c323c2244rrffa4a4c22",
                    "doc_count": 1,
                    "agg2": {
                        "doc_count": 2,
                        "locales": {
                            "doc_count_error_upper_bound": 0,
                            "sum_other_doc_count": 0,
                            "buckets": [
                                {
                                    "key": "en-us",
                                    "doc_count": 1
                                },
                                {
                                    "key": "hi-in",
                                    "doc_count": 1
                                }
                            ]
                        }
                    }
                }
            ]
我有三个文档,其中两个具有相同的id,另一个不同

我将进一步更新查询,以删除第一个有3个bucket的结果。您还可以在代码中进一步处理它

你可以做到。10万份文件就行了。但是当你有数百万的资金时,你应该有足够的资源来执行这项任务

{
  "size" : 0,
  "query":{
      "bool" :{
          "must_not":{
              "match":{
                "publish_details.environment":"603fe91adbdcff66"
              }
          }
      }
  },
  "aggs": {
    "uids": {
      "terms": {
        "field": "uid.raw"
      },
      "aggs": {
        "details": {
          "nested": {
            "path": "publish_details"
          },
          "aggs": {
            "locales": {
              "terms": {
                "field": "publish_details.locale"
              }
            },   
            "unique_locales": {
                "value_count": {
                    "field": "publish_details.locale"
                }
            }
          }
        }
      }
    }
  }
}
结果:

"aggregations": {
        "uids": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "c323c2244a4a4c22",
                    "doc_count": 2,
                    "details": {
                        "doc_count": 3,
                        "locales": {
                            "doc_count_error_upper_bound": 0,
                            "sum_other_doc_count": 0,
                            "buckets": [
                                {
                                    "key": "en-us",
                                    "doc_count": 1
                                },
                                {
                                    "key": "hi-in",
                                    "doc_count": 1
                                },
                                {
                                    "key": "mr-in",
                                    "doc_count": 1
                                }
                            ]
                        },
                        "unique_locales": {
                            "value": 3
                        }
                    }
                },
                {
                    "key": "c323c2244rrffa4a4c22",
                    "doc_count": 1,
                    "details": {
                        "doc_count": 2,
                        "locales": {
                            "doc_count_error_upper_bound": 0,
                            "sum_other_doc_count": 0,
                            "buckets": [
                                {
                                    "key": "en-us",
                                    "doc_count": 1
                                },
                                {
                                    "key": "hi-in",
                                    "doc_count": 1
                                }
                            ]
                        },
                        "unique_locales": {
                            "value": 2
                        }
                    }
                }
            ]

请帮助我进入上述查询以获得有效结果。您的问题是unclear@Gibbs我已经共享了我的弹性搜索文档列表和查询。我想要一个空结果,但我的查询提供了所有文档。所以我想要这样一个查询,给我一个空白的结果。与publish_details.locale和publish_details.environment相关的查询。您说的是相同的uid并检查所有3个区域设置!?是的,两个文档的uid字段值相同。谢谢您的回复。但是我有超过10万个文档,需要对它们进行筛选。我也可以在这里使用publish_details.environment。表示需要忽略uid相同的文档并发布详细信息。环境为:603fe91adbdcff66,发布详细信息。地区为:hi in,en us,mr-in。此处是否需要聚合?或者是任何直接获得结果的方法,就像我在那里使用的查询一样。这里的原因是,使用查询进行聚合时,结果命中率和聚合率都会出现。我需要其他文档的全部详细信息。它不会提供命中率,因为我将大小设置为0。我不认为没有聚合的方法。因为它位于文档之间。谢谢,它不会给出命中率,但是文档中的其他字段呢。意思是说,假设我有3个文档,我只想要一个与其他字段(如标题、url等)一起响应,我们如何通过聚合获得这些字段?在一篇文章中,你会问很多问题。如果当前问题已解决,您可以接受并打开另一个问题。任何人都可以帮你。每个文档的标题、url都是唯一的/相同的/可以是任何内容?。我怀疑你需要点击,但你必须在这些字段上进行聚合。