elasticsearch 加总
简言之,问题是:如果我有一个每个bucket的top_点击数的聚合,那么如何对结果结构中的特定值求和 详情: 我有很多记录,每个商店都有一定数量的记录。我想得到每个商店所有最新记录的总和 为了获得每个存储的最新记录,我创建了以下聚合:elasticsearch 加总,elasticsearch,elasticsearch,简言之,问题是:如果我有一个每个bucket的top_点击数的聚合,那么如何对结果结构中的特定值求和 详情: 我有很多记录,每个商店都有一定数量的记录。我想得到每个商店所有最新记录的总和 为了获得每个存储的最新记录,我创建了以下聚合: "latest_quantity_per_store": { "aggs": { "latest_quantity": { "top_hits": { "sort": [
"latest_quantity_per_store": {
"aggs": {
"latest_quantity": {
"top_hits": {
"sort": [
{
"datetime": "desc"
},
{
"quantity": "asc"
}
],
"_source": {
"includes": [
"quantity"
]
},
"size": 1
}
}
},
"terms": {
"field": "store",
"size": 10000
}
}
"latest_quantity_per_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "01",
"doc_count": 2,
"latest_quantity": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "inventory-local",
"_type": "doc",
"_id": "O6wFD2UBG8e7nvSU8dYg",
"_score": null,
"_source": {
"quantity": 6
},
"sort": [
1532476800000,
6
]
}
]
}
}
},
{
"key": "02",
"doc_count": 2,
"latest_quantity": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "inventory-local",
"_type": "doc",
"_id": "pLUFD2UBHBuSGcoH0ZT4",
"_score": null,
"_source": {
"quantity": 11
},
"sort": [
1532476800000,
11
]
}
]
}
}
}
]
}
"latest_quantity": {
"sum_bucket": {
"buckets_path": "latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity"
}
}
假设我有两个存储,每个存储有两个不同时间戳的数量。这是该聚合的结果:
"latest_quantity_per_store": {
"aggs": {
"latest_quantity": {
"top_hits": {
"sort": [
{
"datetime": "desc"
},
{
"quantity": "asc"
}
],
"_source": {
"includes": [
"quantity"
]
},
"size": 1
}
}
},
"terms": {
"field": "store",
"size": 10000
}
}
"latest_quantity_per_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "01",
"doc_count": 2,
"latest_quantity": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "inventory-local",
"_type": "doc",
"_id": "O6wFD2UBG8e7nvSU8dYg",
"_score": null,
"_source": {
"quantity": 6
},
"sort": [
1532476800000,
6
]
}
]
}
}
},
{
"key": "02",
"doc_count": 2,
"latest_quantity": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "inventory-local",
"_type": "doc",
"_id": "pLUFD2UBHBuSGcoH0ZT4",
"_score": null,
"_source": {
"quantity": 11
},
"sort": [
1532476800000,
11
]
}
]
}
}
}
]
}
"latest_quantity": {
"sum_bucket": {
"buckets_path": "latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity"
}
}
现在我想在ElasticSearch中有一个聚合,它将这些桶的总和取出来。在示例数据中,总和为6和11。我尝试了以下聚合:
"latest_quantity_per_store": {
"aggs": {
"latest_quantity": {
"top_hits": {
"sort": [
{
"datetime": "desc"
},
{
"quantity": "asc"
}
],
"_source": {
"includes": [
"quantity"
]
},
"size": 1
}
}
},
"terms": {
"field": "store",
"size": 10000
}
}
"latest_quantity_per_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "01",
"doc_count": 2,
"latest_quantity": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "inventory-local",
"_type": "doc",
"_id": "O6wFD2UBG8e7nvSU8dYg",
"_score": null,
"_source": {
"quantity": 6
},
"sort": [
1532476800000,
6
]
}
]
}
}
},
{
"key": "02",
"doc_count": 2,
"latest_quantity": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "inventory-local",
"_type": "doc",
"_id": "pLUFD2UBHBuSGcoH0ZT4",
"_score": null,
"_source": {
"quantity": 11
},
"sort": [
1532476800000,
11
]
}
]
}
}
}
]
}
"latest_quantity": {
"sum_bucket": {
"buckets_path": "latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity"
}
}
但这会导致以下错误:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "No aggregation [hits] found for path [latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity]"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "inventory-local",
"node": "3z5CqmmAQ-yT2sUCb69DzA",
"reason": {
"type": "illegal_argument_exception",
"reason": "No aggregation [hits] found for path [latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity]"
}
}
]
},
"status": 400
}
从ElasticSearch中获取数字17的正确聚合是什么?
我对我拥有的另一个聚合做了类似的事情,一个平均值,而不是顶级点击率聚合
"average_quantity": {
"sum_bucket": {
"buckets_path": "average_quantity_per_store>average_quantity"
}
},
"average_quantity_per_store": {
"aggs": {
"average_quantity": {
"avg": {
"field": "quantity"
}
}
},
"terms": {
"field": "store",
"size": 10000
}
}
这与预期一样有效,结果如下:
"average_quantity_per_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "01",
"doc_count": 2,
"average_quantity": {
"value": 6
}
},
{
"key": "02",
"doc_count": 2,
"average_quantity": {
"value": 11.5
}
}
]
},
"average_quantity": {
"value": 17.5
}
有一种方法可以通过混合使用聚合和管道聚合来解决这个问题。脚本化的度量聚合有点复杂,但其主要思想是允许您提供自己的bucketing算法并从中吐出单个度量值 在您的情况下,您要做的是计算每个门店的最新数量,然后将这些门店数量相加。解决方案如下所示,我将在下面解释一些细节:
POST inventory-local/_search
{
"size": 0,
"aggs": {
"bystore": {
"terms": {
"field": "store.keyword",
"size": 10000
},
"aggs": {
"latest_quantity": {
"scripted_metric": {
"init_script": "params._agg.quantities = new TreeMap()",
"map_script": "params._agg.quantities.put(doc.datetime.date, [doc.datetime.date.millis, doc.quantity.value])",
"combine_script": "return params._agg.quantities.lastEntry().getValue()",
"reduce_script": "def maxkey = 0; def qty = 0; for (a in params._aggs) {def currentKey = a[0]; if (currentKey > maxkey) {maxkey = currentKey; qty = a[1]} } return qty;"
}
}
}
},
"sum_latest_quantities": {
"sum_bucket": {
"buckets_path": "bystore>latest_quantity.value"
}
}
}
}
请注意,为了使其正常工作,您需要在elasticsearch.yml
配置文件中设置script.painless.regex.enabled:true
init_脚本
为每个碎片创建一个TreeMap
。
map\u脚本用日期/数量的映射填充每个碎片上的TreeMap
。我们在映射中输入的值包含单个字符串中的时间戳和数量。稍后在reduce\u脚本中需要该时间戳。
combine\u脚本
只取TreeMap
的最后一个值,因为这是给定碎片的最新数量。
大部分工作位于reduce\u脚本中。我们迭代每个碎片的所有最新数量,并返回最新数量
现在,我们有每家商店的最新数量。剩下要做的就是使用一个sum_bucket
管道聚合来对每个存储数量进行求和。这就是17的结果
响应如下所示:
"aggregations": {
"bystore": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "01",
"doc_count": 2,
"latest_quantity": {
"value": 6
}
},
{
"key": "02",
"doc_count": 2,
"latest_quantity": {
"value": 11
}
}
]
},
"sum_latest_quantities": {
"value": 17
}
}
谢谢你知道我如何在AWS托管的elasticsearch域上更改elasticsearch.yml
文件吗?嗯,好问题。我不确定你真的可以。我知道您可以设置一些白名单设置,但不确定AWS是否允许更改此设置。你应该考虑迁移到替代(由AWS支持,但更灵活)感谢链接!同时,我在AWS论坛上也提出了同样的问题:答案是:我们可能不需要使用regexp来拆分字符串,问题是。我马上报告……我已经更新了答案,只需使用数组而不是字符串。不再需要分裂了。