elasticsearch 加总,elasticsearch,elasticsearch" /> elasticsearch 加总,elasticsearch,elasticsearch" />

elasticsearch 加总

elasticsearch 加总,elasticsearch,elasticsearch,简言之,问题是:如果我有一个每个bucket的top_点击数的聚合,那么如何对结果结构中的特定值求和 详情: 我有很多记录,每个商店都有一定数量的记录。我想得到每个商店所有最新记录的总和 为了获得每个存储的最新记录,我创建了以下聚合: "latest_quantity_per_store": { "aggs": { "latest_quantity": { "top_hits": { "sort": [

简言之,问题是:如果我有一个每个bucket的top_点击数的聚合,那么如何对结果结构中的特定值求和

详情:

我有很多记录,每个商店都有一定数量的记录。我想得到每个商店所有最新记录的总和

为了获得每个存储的最新记录,我创建了以下聚合:

"latest_quantity_per_store": {
    "aggs": {
        "latest_quantity": {
            "top_hits": {
                "sort": [
                    {
                        "datetime": "desc"
                    },
                    {
                        "quantity": "asc"
                    }
                ],
                "_source": {
                    "includes": [
                        "quantity"
                    ]
                },
                "size": 1
            }
        }
    },
    "terms": {
        "field": "store",
        "size": 10000
    }
}
"latest_quantity_per_store": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
            "key": "01",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "O6wFD2UBG8e7nvSU8dYg",
                            "_score": null,
                            "_source": {
                                "quantity": 6
                            },
                            "sort": [
                                1532476800000,
                                6
                            ]
                        }
                    ]
                }
            }
        },
        {
            "key": "02",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "pLUFD2UBHBuSGcoH0ZT4",
                            "_score": null,
                            "_source": {
                                "quantity": 11
                            },
                            "sort": [
                                1532476800000,
                                11
                            ]
                        }
                    ]
                }
            }
        }
    ]
}
"latest_quantity": {
    "sum_bucket": {
        "buckets_path": "latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity"
    }
}
假设我有两个存储,每个存储有两个不同时间戳的数量。这是该聚合的结果:

"latest_quantity_per_store": {
    "aggs": {
        "latest_quantity": {
            "top_hits": {
                "sort": [
                    {
                        "datetime": "desc"
                    },
                    {
                        "quantity": "asc"
                    }
                ],
                "_source": {
                    "includes": [
                        "quantity"
                    ]
                },
                "size": 1
            }
        }
    },
    "terms": {
        "field": "store",
        "size": 10000
    }
}
"latest_quantity_per_store": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
            "key": "01",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "O6wFD2UBG8e7nvSU8dYg",
                            "_score": null,
                            "_source": {
                                "quantity": 6
                            },
                            "sort": [
                                1532476800000,
                                6
                            ]
                        }
                    ]
                }
            }
        },
        {
            "key": "02",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "pLUFD2UBHBuSGcoH0ZT4",
                            "_score": null,
                            "_source": {
                                "quantity": 11
                            },
                            "sort": [
                                1532476800000,
                                11
                            ]
                        }
                    ]
                }
            }
        }
    ]
}
"latest_quantity": {
    "sum_bucket": {
        "buckets_path": "latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity"
    }
}
现在我想在ElasticSearch中有一个聚合,它将这些桶的总和取出来。在示例数据中,总和为6和11。我尝试了以下聚合:

"latest_quantity_per_store": {
    "aggs": {
        "latest_quantity": {
            "top_hits": {
                "sort": [
                    {
                        "datetime": "desc"
                    },
                    {
                        "quantity": "asc"
                    }
                ],
                "_source": {
                    "includes": [
                        "quantity"
                    ]
                },
                "size": 1
            }
        }
    },
    "terms": {
        "field": "store",
        "size": 10000
    }
}
"latest_quantity_per_store": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
            "key": "01",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "O6wFD2UBG8e7nvSU8dYg",
                            "_score": null,
                            "_source": {
                                "quantity": 6
                            },
                            "sort": [
                                1532476800000,
                                6
                            ]
                        }
                    ]
                }
            }
        },
        {
            "key": "02",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "pLUFD2UBHBuSGcoH0ZT4",
                            "_score": null,
                            "_source": {
                                "quantity": 11
                            },
                            "sort": [
                                1532476800000,
                                11
                            ]
                        }
                    ]
                }
            }
        }
    ]
}
"latest_quantity": {
    "sum_bucket": {
        "buckets_path": "latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity"
    }
}
但这会导致以下错误:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "No aggregation [hits] found for path [latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity]"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "inventory-local",
        "node": "3z5CqmmAQ-yT2sUCb69DzA",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "No aggregation [hits] found for path [latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity]"
        }
      }
    ]
  },
  "status": 400
}
从ElasticSearch中获取数字17的正确聚合是什么?

我对我拥有的另一个聚合做了类似的事情,一个平均值,而不是顶级点击率聚合

"average_quantity": {
    "sum_bucket": {
        "buckets_path": "average_quantity_per_store>average_quantity"
    }
},
"average_quantity_per_store": {
    "aggs": {
        "average_quantity": {
            "avg": {
                "field": "quantity"
            }
        }
    },
    "terms": {
        "field": "store",
        "size": 10000
    }
}
这与预期一样有效,结果如下:

"average_quantity_per_store": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
            "key": "01",
            "doc_count": 2,
            "average_quantity": {
                "value": 6
            }
        },
        {
            "key": "02",
            "doc_count": 2,
            "average_quantity": {
                "value": 11.5
            }
        }
    ]
},
"average_quantity": {
    "value": 17.5
}

有一种方法可以通过混合使用聚合和管道聚合来解决这个问题。脚本化的度量聚合有点复杂,但其主要思想是允许您提供自己的bucketing算法并从中吐出单个度量值

在您的情况下,您要做的是计算每个门店的最新数量,然后将这些门店数量相加。解决方案如下所示,我将在下面解释一些细节:

POST inventory-local/_search
{
  "size": 0,
  "aggs": {
    "bystore": {
      "terms": {
        "field": "store.keyword",
        "size": 10000
      },
      "aggs": {
        "latest_quantity": {
          "scripted_metric": {
            "init_script": "params._agg.quantities = new TreeMap()",
            "map_script": "params._agg.quantities.put(doc.datetime.date, [doc.datetime.date.millis, doc.quantity.value])",
            "combine_script": "return params._agg.quantities.lastEntry().getValue()",
            "reduce_script": "def maxkey = 0; def qty = 0; for (a in params._aggs) {def currentKey = a[0]; if (currentKey > maxkey) {maxkey = currentKey; qty = a[1]} } return qty;"
          }
        }
      }
    },
    "sum_latest_quantities": {
      "sum_bucket": {
        "buckets_path": "bystore>latest_quantity.value"
      }
    }
  }
}
请注意,为了使其正常工作,您需要在
elasticsearch.yml
配置文件中设置
script.painless.regex.enabled:true

init_脚本
为每个碎片创建一个
TreeMap
map\u脚本用日期/数量的映射填充每个碎片上的
TreeMap
。我们在映射中输入的值包含单个字符串中的时间戳和数量。稍后在
reduce\u脚本中需要该时间戳。
combine\u脚本
只取
TreeMap
的最后一个值,因为这是给定碎片的最新数量。 大部分工作位于
reduce\u脚本中。我们迭代每个碎片的所有最新数量,并返回最新数量

现在,我们有每家商店的最新数量。剩下要做的就是使用一个
sum_bucket
管道聚合来对每个存储数量进行求和。这就是17的结果

响应如下所示:

 "aggregations": {
    "bystore": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "01",
          "doc_count": 2,
          "latest_quantity": {
            "value": 6
          }
        },
        {
          "key": "02",
          "doc_count": 2,
          "latest_quantity": {
            "value": 11
          }
        }
      ]
    },
    "sum_latest_quantities": {
      "value": 17
    }
  }

谢谢你知道我如何在AWS托管的elasticsearch域上更改
elasticsearch.yml
文件吗?嗯,好问题。我不确定你真的可以。我知道您可以设置一些白名单设置,但不确定AWS是否允许更改此设置。你应该考虑迁移到替代(由AWS支持,但更灵活)感谢链接!同时,我在AWS论坛上也提出了同样的问题:答案是:我们可能不需要使用regexp来拆分字符串,问题是。我马上报告……我已经更新了答案,只需使用数组而不是字符串。不再需要分裂了。