<img src="//i.stack.imgur.com/RUiNP.png" height="16" width="18" alt="" class="sponsor tag img">elasticsearch Elasticsearch post_过滤器聚合查询_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch

elasticsearch Elasticsearch post_过滤器聚合查询

elasticsearch Elasticsearch post_过滤器聚合查询,elasticsearch,elasticsearch,我对所有API感兴趣，这些API甚至没有返回一个200响应（在特定的时间间隔内）我基本上需要这个： select url from api_log except/minus select url from api_log where status='200' 翻译成ES，我正在尝试一种类似的方法：首先计算总量从随后的结果中，筛选出所有具有状态为200的子项的记录 ES样本数据 { "_index": "api_log", "_type":

我对所有API感兴趣，这些API甚至没有返回一个200响应（在特定的时间间隔内）

我基本上需要这个：

     select url from api_log
      except/minus 
     select url from api_log where status='200'

翻译成ES，我正在尝试一种类似的方法：

首先计算总量

从随后的结果中，筛选出所有具有状态为200的子项的记录

ES样本数据

{
    "_index": "api_log",
    "_type": "_doc",
    "_id": "1",
    "_version": 1,
    "_score": 1,
    "_source": {
        "in_time": "2019-05-13T17:20:51.108945",
        "out_time": "2019-05-13T17:20:51.145549",
        "duration": 36.6041660308838,
        "status": "200",
        "url": "/api/myFirstAPI"
    }
}
,
{
    "_index": "api_log",
    "_type": "_doc",
    "_id": "2",
    "_version": 1,
    "_score": 1,
    "_source": {
        "in_time": "2019-05-13T17:20:57.915694",
        "out_time": "2019-05-13T17:20:57.941989",
        "duration": 26.2949466705322,
        "status": "403",
        "url": "/api/mySecondAPI"
    }
},
{
    "_index": "api_log",
    "_type": "_doc",
    "_id": "3",
    "_version": 1,
    "_score": 1,
    "_source": {
        "in_time": "2019-05-13T17:22:35.274372",
        "out_time": "2019-05-13T17:22:35.288944",
        "duration": 14.5719051361084,
        "status": "400",
        "url": "/api/myFirstAPI"
    }
}

对于以上数据，我希望结果url为{'/api/mySecondAPI'}

仅使用AGG的请求/响应

POST /api_log/_search
{
  "size": 0,
  "aggs": {
    "url": {
      "terms": {
    "field": "url.keyword"
      },
      "aggregations": {
    "status": {
      "terms": {
        "field": "status.keyword"
      }
    }
      }
    }
  }
}

对上述请求的响应

{
  "took" : 880,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "url" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 394668,
      "buckets" : [
        {
          "key" : "/api/myFirstRequest",
          "doc_count" : 1352845,
          "status" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "200",
                "doc_count" : 1187611
              },
              {
                "key" : "302",
                "doc_count" : 139932
              },
              {
                "key" : "401",
                "doc_count" : 22615
              },
              {
                "key" : "500",
                "doc_count" : 2250
              },
              {
                "key" : "403",
                "doc_count" : 437
              }
            ]
          }
        },
...
...
...

从上面我需要过滤掉所有没有状态为“200”的子bucket的bucket（URL）

我已经走了这么远。看起来很近，但很远…似乎无法确定类型字段中应该包含什么

带过滤器的请求

POST /api_log/_search
{
  "size": 0,
  "aggs": {
    "page_name": {
      "terms": {
        "field": "url.keyword"
      },
      "aggregations": {
        "status": {
          "terms": {
            "field": "status.keyword"
          }
        }
      }
    }
  },
   "post_filter": {
      "bool": {
        "must_not": [
            {
                "has_child" : {
                    "type" : "?????",
                    "query" : {
                        "term" : {"status" : "200"}
                    }
                }
            }
        ]
      }
    }
}

示例输入（来自apache日志）：

t1 /api/FirstAPI 200  <-- Eliminate First API completely
t2 /api/FirstAPI 400
t3 /api/FirstAPI 403
t4 /api/SecondAPI 403
t5 /api/SecondAPI 400
t6 /api/ThirdAPI 500
t7 /api/ThirdAPI 500
t8 /api/SecondAPI 200   <---Eliminate Second API completely
t9 /api/ThirdAPI 500
t10 /api/ThirdAPI 403

t1/api/FirstAPI 200如果我理解正确，您只想从聚合中排除200。我看不出在这里使用post\u过滤器的理由。您可以使用术语聚合
。这将统计所有200
响应，并将其添加到doc\u count
字段中，但将排除聚合响应中的桶，并且不会显示200

POST /api_log/_search
{
  "size": 0,
  "aggs": {
    "url": {
      "terms": {
        "field": "url.keyword"
      },
      "aggregations": {
        "status": {
          "terms": {
            "field": "status.keyword",
            "exclude": "200"
          }
        }
      }
    }
  }
}

备选方案：
t1 /api/FirstAPI 200  <-- Eliminate First API completely
t2 /api/FirstAPI 400
t3 /api/FirstAPI 403
t4 /api/SecondAPI 403
t5 /api/SecondAPI 400
t6 /api/ThirdAPI 500
t7 /api/ThirdAPI 500
t8 /api/SecondAPI 200   <---Eliminate Second API completely
t9 /api/ThirdAPI 500
t10 /api/ThirdAPI 403

根据您的输入，您似乎希望将200
作为结果集的一部分（因为您使用的是post_filter），但如果不是这样，这里有另一种方法。对查询响应进行聚合；因此，如果使用从结果集中排除200，则不会有任何状态为200的bucket
POST /api_log/_search
    {
      "size": 0,
      "query": {
        "bool": {
          "must_not": [
            {
              "terms": {
                "status": [
                  "200"
                ]
              }
            }
          ]
        }
      }, 
      "aggs": {
        "url": {
          "terms": {
            "field": "url.keyword"
          },
          "aggregations": {
            "status": {
              "terms": {
                "field": "status.keyword"
              }
            }
          }
        }
      }
    } 

如果我理解正确，您只想从聚合中排除200。我看不出在这里使用post\u过滤器的理由。您可以使用术语聚合
。这将统计所有200
响应，并将其添加到doc\u count
字段中，但将排除聚合响应中的桶，并且不会显示200

POST /api_log/_search
{
  "size": 0,
  "aggs": {
    "url": {
      "terms": {
        "field": "url.keyword"
      },
      "aggregations": {
        "status": {
          "terms": {
            "field": "status.keyword",
            "exclude": "200"
          }
        }
      }
    }
  }
}

备选方案：
t1 /api/FirstAPI 200  <-- Eliminate First API completely
t2 /api/FirstAPI 400
t3 /api/FirstAPI 403
t4 /api/SecondAPI 403
t5 /api/SecondAPI 400
t6 /api/ThirdAPI 500
t7 /api/ThirdAPI 500
t8 /api/SecondAPI 200   <---Eliminate Second API completely
t9 /api/ThirdAPI 500
t10 /api/ThirdAPI 403

根据您的输入，您似乎希望将200
作为结果集的一部分（因为您使用的是post_filter），但如果不是这样，这里有另一种方法。对查询响应进行聚合；因此，如果使用从结果集中排除200，则不会有任何状态为200的bucket
POST /api_log/_search
    {
      "size": 0,
      "query": {
        "bool": {
          "must_not": [
            {
              "terms": {
                "status": [
                  "200"
                ]
              }
            }
          ]
        }
      }, 
      "aggs": {
        "url": {
          "terms": {
            "field": "url.keyword"
          },
          "aggregations": {
            "status": {
              "terms": {
                "field": "status.keyword"
              }
            }
          }
        }
      }
    } 

你能添加一些示例api_日志数据和映射/模式吗？根据你的要求添加更多信息@theuknownc你能添加一些示例api_日志数据和映射/模式吗？根据你的要求添加更多信息@theuknown这不是我想要的。请看我的问题再次编辑。我已经添加了样本数据，以便更清楚地了解@theuknow，但这不是我想要的。请看我的问题再次编辑。我已经添加了样本数据，以便更清楚地了解@TheUknown