elasticsearch Elasticsearch post_过滤器聚合查询,elasticsearch,elasticsearch" /> elasticsearch Elasticsearch post_过滤器聚合查询,elasticsearch,elasticsearch" />

elasticsearch Elasticsearch post_过滤器聚合查询

elasticsearch Elasticsearch post_过滤器聚合查询,elasticsearch,elasticsearch,我对所有API感兴趣,这些API甚至没有返回一个200响应(在特定的时间间隔内) 我基本上需要这个: select url from api_log except/minus select url from api_log where status='200' 翻译成ES,我正在尝试一种类似的方法: 首先计算总量 从随后的结果中,筛选出所有具有状态为200的子项的记录 ES样本数据 { "_index": "api_log", "_type":

我对所有API感兴趣,这些API甚至没有返回一个200响应(在特定的时间间隔内)

我基本上需要这个:

     select url from api_log
      except/minus 
     select url from api_log where status='200'
翻译成ES,我正在尝试一种类似的方法:

  • 首先计算总量
  • 从随后的结果中,筛选出所有具有状态为200的子项的记录
  • ES样本数据

    {
        "_index": "api_log",
        "_type": "_doc",
        "_id": "1",
        "_version": 1,
        "_score": 1,
        "_source": {
            "in_time": "2019-05-13T17:20:51.108945",
            "out_time": "2019-05-13T17:20:51.145549",
            "duration": 36.6041660308838,
            "status": "200",
            "url": "/api/myFirstAPI"
        }
    }
    ,
    {
        "_index": "api_log",
        "_type": "_doc",
        "_id": "2",
        "_version": 1,
        "_score": 1,
        "_source": {
            "in_time": "2019-05-13T17:20:57.915694",
            "out_time": "2019-05-13T17:20:57.941989",
            "duration": 26.2949466705322,
            "status": "403",
            "url": "/api/mySecondAPI"
        }
    },
    {
        "_index": "api_log",
        "_type": "_doc",
        "_id": "3",
        "_version": 1,
        "_score": 1,
        "_source": {
            "in_time": "2019-05-13T17:22:35.274372",
            "out_time": "2019-05-13T17:22:35.288944",
            "duration": 14.5719051361084,
            "status": "400",
            "url": "/api/myFirstAPI"
        }
    }
    
    对于以上数据,我希望结果url为{'/api/mySecondAPI'}

    仅使用AGG的请求/响应

    POST /api_log/_search
    {
      "size": 0,
      "aggs": {
        "url": {
          "terms": {
        "field": "url.keyword"
          },
          "aggregations": {
        "status": {
          "terms": {
            "field": "status.keyword"
          }
        }
          }
        }
      }
    }
    
    
    对上述请求的响应

    {
      "took" : 880,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 10000,
          "relation" : "gte"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "aggregations" : {
        "url" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 394668,
          "buckets" : [
            {
              "key" : "/api/myFirstRequest",
              "doc_count" : 1352845,
              "status" : {
                "doc_count_error_upper_bound" : 0,
                "sum_other_doc_count" : 0,
                "buckets" : [
                  {
                    "key" : "200",
                    "doc_count" : 1187611
                  },
                  {
                    "key" : "302",
                    "doc_count" : 139932
                  },
                  {
                    "key" : "401",
                    "doc_count" : 22615
                  },
                  {
                    "key" : "500",
                    "doc_count" : 2250
                  },
                  {
                    "key" : "403",
                    "doc_count" : 437
                  }
                ]
              }
            },
    ...
    ...
    ...
    
    从上面我需要过滤掉所有没有状态为“200”的子bucket的bucket(URL)

    我已经走了这么远。看起来很近,但很远…似乎无法确定类型字段中应该包含什么

    带过滤器的请求

    POST /api_log/_search
    {
      "size": 0,
      "aggs": {
        "page_name": {
          "terms": {
            "field": "url.keyword"
          },
          "aggregations": {
            "status": {
              "terms": {
                "field": "status.keyword"
              }
            }
          }
        }
      },
       "post_filter": {
          "bool": {
            "must_not": [
                {
                    "has_child" : {
                        "type" : "?????",
                        "query" : {
                            "term" : {"status" : "200"}
                        }
                    }
                }
            ]
          }
        }
    }
    
    示例输入(来自apache日志):

    t1 /api/FirstAPI 200  <-- Eliminate First API completely
    t2 /api/FirstAPI 400
    t3 /api/FirstAPI 403
    t4 /api/SecondAPI 403
    t5 /api/SecondAPI 400
    t6 /api/ThirdAPI 500
    t7 /api/ThirdAPI 500
    t8 /api/SecondAPI 200   <---Eliminate Second API completely
    t9 /api/ThirdAPI 500
    t10 /api/ThirdAPI 403
    

    t1/api/FirstAPI 200如果我理解正确,您只想从聚合中排除200。我看不出在这里使用
    post\u过滤器的理由。您可以使用术语聚合

    。这将统计所有
    200
    响应,并将其添加到
    doc\u count
    字段中,但将排除聚合响应中的桶,并且不会显示
    200

    POST /api_log/_search
    {
      "size": 0,
      "aggs": {
        "url": {
          "terms": {
            "field": "url.keyword"
          },
          "aggregations": {
            "status": {
              "terms": {
                "field": "status.keyword",
                "exclude": "200"
              }
            }
          }
        }
      }
    }
    
    备选方案:

    t1 /api/FirstAPI 200  <-- Eliminate First API completely
    t2 /api/FirstAPI 400
    t3 /api/FirstAPI 403
    t4 /api/SecondAPI 403
    t5 /api/SecondAPI 400
    t6 /api/ThirdAPI 500
    t7 /api/ThirdAPI 500
    t8 /api/SecondAPI 200   <---Eliminate Second API completely
    t9 /api/ThirdAPI 500
    t10 /api/ThirdAPI 403
    
    根据您的输入,您似乎希望将
    200
    作为结果集的一部分(因为您使用的是post_filter),但如果不是这样,这里有另一种方法。对查询响应进行聚合;因此,如果使用从结果集中排除200,则不会有任何状态为200的bucket

    POST /api_log/_search
        {
          "size": 0,
          "query": {
            "bool": {
              "must_not": [
                {
                  "terms": {
                    "status": [
                      "200"
                    ]
                  }
                }
              ]
            }
          }, 
          "aggs": {
            "url": {
              "terms": {
                "field": "url.keyword"
              },
              "aggregations": {
                "status": {
                  "terms": {
                    "field": "status.keyword"
                  }
                }
              }
            }
          }
        } 
    

    如果我理解正确,您只想从聚合中排除200。我看不出在这里使用
    post\u过滤器的理由。您可以使用术语聚合

    。这将统计所有
    200
    响应,并将其添加到
    doc\u count
    字段中,但将排除聚合响应中的桶,并且不会显示
    200

    POST /api_log/_search
    {
      "size": 0,
      "aggs": {
        "url": {
          "terms": {
            "field": "url.keyword"
          },
          "aggregations": {
            "status": {
              "terms": {
                "field": "status.keyword",
                "exclude": "200"
              }
            }
          }
        }
      }
    }
    
    备选方案:

    t1 /api/FirstAPI 200  <-- Eliminate First API completely
    t2 /api/FirstAPI 400
    t3 /api/FirstAPI 403
    t4 /api/SecondAPI 403
    t5 /api/SecondAPI 400
    t6 /api/ThirdAPI 500
    t7 /api/ThirdAPI 500
    t8 /api/SecondAPI 200   <---Eliminate Second API completely
    t9 /api/ThirdAPI 500
    t10 /api/ThirdAPI 403
    
    根据您的输入,您似乎希望将
    200
    作为结果集的一部分(因为您使用的是post_filter),但如果不是这样,这里有另一种方法。对查询响应进行聚合;因此,如果使用从结果集中排除200,则不会有任何状态为200的bucket

    POST /api_log/_search
        {
          "size": 0,
          "query": {
            "bool": {
              "must_not": [
                {
                  "terms": {
                    "status": [
                      "200"
                    ]
                  }
                }
              ]
            }
          }, 
          "aggs": {
            "url": {
              "terms": {
                "field": "url.keyword"
              },
              "aggregations": {
                "status": {
                  "terms": {
                    "field": "status.keyword"
                  }
                }
              }
            }
          }
        } 
    

    你能添加一些示例api_日志数据和映射/模式吗?根据你的要求添加更多信息@theuknownc你能添加一些示例api_日志数据和映射/模式吗?根据你的要求添加更多信息@theuknown这不是我想要的。请看我的问题再次编辑。我已经添加了样本数据,以便更清楚地了解@theuknow,但这不是我想要的。请看我的问题再次编辑。我已经添加了样本数据,以便更清楚地了解@TheUknown