elasticsearch查询另一个字段上具有where条件的不同字段值的计数

elasticsearch查询另一个字段上具有where条件的不同字段值的计数,
Warning: implode(): Invalid arguments passed in /data/phpspider/zhask/webroot/tpl/detail.html on line 45
,,我想查询我的elasticsearch索引，类似于下面对postgres的查询 select count(distinct(candidate_id)) from candidate_ranking cr where badge='1' 请考虑下面的样本索引，很少有文件 { "id": 295537, "candidate_id": 29492, "created_at": "2021-03-30T02

我想查询我的elasticsearch索引，类似于下面对postgres的查询

    select count(distinct(candidate_id)) from candidate_ranking cr 
where badge='1'

请考虑下面的样本索引，很少有文件

{
  "id": 295537,
  "candidate_id": 29492,
  "created_at": "2021-03-30T02:23:42.077149+00:00",
  "badge": "1"
}
{
  "id": 271179,
  "candidate_id": 29492,
  "created_at": "2021-03-30T01:19:59.803999+00:00",
  "badge": "1"
}
{
  "id": 247169,
  "candidate_id": 29492,
  "created_at": "2021-03-30T00:16:04.077245+00:00",
  "badge": "1"
}
{
  "id": 247156,
  "candidate_id": 29332,
  "created_at": "2021-03-30T00:17:04.077245+00:00",
  "badge": "1"
}
{
  "id": 225434,
  "candidate_id": 24493,
  "created_at": "2021-03-29T23:13:59.266074+00:00",
  "badge": null
}
{
  "id": 192999,
  "candidate_id": 24493,
  "created_at": "2021-03-29T22:20:24.942116+00:00",
  "badge": null
}
{
  "id": 177712,
  "candidate_id": 24493,
  "created_at": "2021-03-29T21:33:32.596613+00:00",
  "badge": null
}
{
  "id": 162916,
  "candidate_id": 24493,
  "created_at": "2021-03-29T21:05:03.985032+00:00",
  "badge": null
}
{
  "id": 148136,
  "candidate_id": 23422,
  "created_at": "2021-03-29T20:20:36.482066+00:00",
  "badge": "2"
}
{
  "id": 118558,
  "candidate_id": 23422,
  "created_at": "2021-03-27T01:34:29.628550+00:00",
  "badge": "2"
}
{
  "id": 133354,
  "candidate_id": 23422,
  "created_at": "2021-03-27T02:11:35.811420+00:00",
  "badge": "2"
}

对于上述情况，我的答案计数应该是2，因为候选者_id=29492，29332有徽章1。“我的es索引”包含许多文档，这些文档具有相同的候选id，但在字段中创建的文档不同

您需要使用聚合-聚合的多种组合

然后你需要使用，来获得桶的数量

{
  "size": 0,
  "aggs": {
    "badge_1": {
      "terms": {
        "field": "badge.keyword",
        "include": [
          "1"
        ],
        "size": 10
      },
      "aggs": {
        "unique_id": {
          "terms": {
            "field": "candidate_id",
            "size": 10,
            "order": {
              "latestOrder": "desc"
            }
          },
          "aggs": {
            "top_doc": {
              "top_hits": {
                "size": 1
              }
            },
            "latestOrder": {
              "max": {
                "field": "created_at"
              }
            }
          }
        },
        "bucketcount": {
          "stats_bucket": {
            "buckets_path": "unique_id._count"
          }
        }
      }
    }
  }
}

搜索结果将是

    "aggregations": {
    "badge_1": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "1",
          "doc_count": 4,
          "unique_id": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": 29492,
                "doc_count": 3,
                "latestOrder": {
                  "value": 1.617071022077E12,
                  "value_as_string": "2021-03-30T02:23:42.077000Z"
                },
                "top_doc": {
                  "hits": {
                    "total": {
                      "value": 3,
                      "relation": "eq"
                    },
                    "max_score": 1.0,
                    "hits": [
                      {
                        "_index": "67162554",
                        "_type": "_doc",
                        "_id": "1",
                        "_score": 1.0,
                        "_source": {
                          "id": 295537,
                          "candidate_id": 29492,
                          "created_at": "2021-03-30T02:23:42.077149+00:00",
                          "badge": "1"
                        }
                      }
                    ]
                  }
                }
              },
              {
                "key": 29332,
                "doc_count": 1,
                "latestOrder": {
                  "value": 1.617063424077E12,
                  "value_as_string": "2021-03-30T00:17:04.077000Z"
                },
                "top_doc": {
                  "hits": {
                    "total": {
                      "value": 1,
                      "relation": "eq"
                    },
                    "max_score": 1.0,
                    "hits": [
                      {
                        "_index": "67162554",
                        "_type": "_doc",
                        "_id": "4",
                        "_score": 1.0,
                        "_source": {
                          "id": 247156,
                          "candidate_id": 29332,
                          "created_at": "2021-03-30T00:17:04.077245+00:00",
                          "badge": "1"
                        }
                      }
                    ]
                  }
                }
              }
            ]
          },
          "bucketcount": {
            "count": 2,        // note this
            "min": 1.0,
            "max": 3.0,
            "avg": 2.0,
            "sum": 4.0
          }
        }
      ]
    }
  }

要计算不同的数据，可以使用术语聚合并从结果中计算存储桶。像这样的

  GET test_index/_search
  {
    "size": 0,
    "query": {
      "match": {
        "badge": "1"
      }
    }, 
    "aggs": {
      "candidate_aggs": {
        "terms": {
          "field": "candidate_id"
        }
      }
    }
  }

这将返回以下内容

"aggregations" : {
"candidate_aggs" : {
  "doc_count_error_upper_bound" : 0,
  "sum_other_doc_count" : 0,
  "buckets" : [
    {
      "key" : 29492,
      "doc_count" : 3
    },
    {
      "key" : 29332,
      "doc_count" : 1
    }
  ]
}

非常感谢。是的，解决方案很好。@RicheshChouksey很高兴我能帮助你：-）