<img src="//i.stack.imgur.com/RUiNP.png" height="16" width="18" alt="" class="sponsor tag img">elasticsearch 加总_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch

elasticsearch 加总

elasticsearch 加总,elasticsearch,elasticsearch,简言之，问题是：如果我有一个每个bucket的top_点击数的聚合，那么如何对结果结构中的特定值求和详情：我有很多记录，每个商店都有一定数量的记录。我想得到每个商店所有最新记录的总和为了获得每个存储的最新记录，我创建了以下聚合： "latest_quantity_per_store": { "aggs": { "latest_quantity": { "top_hits": { "sort": [

简言之，问题是：如果我有一个每个bucket的top_点击数的聚合，那么如何对结果结构中的特定值求和

详情：

我有很多记录，每个商店都有一定数量的记录。我想得到每个商店所有最新记录的总和

为了获得每个存储的最新记录，我创建了以下聚合：

"latest_quantity_per_store": {
    "aggs": {
        "latest_quantity": {
            "top_hits": {
                "sort": [
                    {
                        "datetime": "desc"
                    },
                    {
                        "quantity": "asc"
                    }
                ],
                "_source": {
                    "includes": [
                        "quantity"
                    ]
                },
                "size": 1
            }
        }
    },
    "terms": {
        "field": "store",
        "size": 10000
    }
}

"latest_quantity_per_store": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
            "key": "01",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "O6wFD2UBG8e7nvSU8dYg",
                            "_score": null,
                            "_source": {
                                "quantity": 6
                            },
                            "sort": [
                                1532476800000,
                                6
                            ]
                        }
                    ]
                }
            }
        },
        {
            "key": "02",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "pLUFD2UBHBuSGcoH0ZT4",
                            "_score": null,
                            "_source": {
                                "quantity": 11
                            },
                            "sort": [
                                1532476800000,
                                11
                            ]
                        }
                    ]
                }
            }
        }
    ]
}

"latest_quantity": {
    "sum_bucket": {
        "buckets_path": "latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity"
    }
}

假设我有两个存储，每个存储有两个不同时间戳的数量。这是该聚合的结果：

"latest_quantity_per_store": {
    "aggs": {
        "latest_quantity": {
            "top_hits": {
                "sort": [
                    {
                        "datetime": "desc"
                    },
                    {
                        "quantity": "asc"
                    }
                ],
                "_source": {
                    "includes": [
                        "quantity"
                    ]
                },
                "size": 1
            }
        }
    },
    "terms": {
        "field": "store",
        "size": 10000
    }
}

"latest_quantity_per_store": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
            "key": "01",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "O6wFD2UBG8e7nvSU8dYg",
                            "_score": null,
                            "_source": {
                                "quantity": 6
                            },
                            "sort": [
                                1532476800000,
                                6
                            ]
                        }
                    ]
                }
            }
        },
        {
            "key": "02",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "pLUFD2UBHBuSGcoH0ZT4",
                            "_score": null,
                            "_source": {
                                "quantity": 11
                            },
                            "sort": [
                                1532476800000,
                                11
                            ]
                        }
                    ]
                }
            }
        }
    ]
}

"latest_quantity": {
    "sum_bucket": {
        "buckets_path": "latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity"
    }
}

现在我想在ElasticSearch中有一个聚合，它将这些桶的总和取出来。在示例数据中，总和为6和11。我尝试了以下聚合：

"latest_quantity_per_store": {
    "aggs": {
        "latest_quantity": {
            "top_hits": {
                "sort": [
                    {
                        "datetime": "desc"
                    },
                    {
                        "quantity": "asc"
                    }
                ],
                "_source": {
                    "includes": [
                        "quantity"
                    ]
                },
                "size": 1
            }
        }
    },
    "terms": {
        "field": "store",
        "size": 10000
    }
}

"latest_quantity_per_store": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
            "key": "01",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "O6wFD2UBG8e7nvSU8dYg",
                            "_score": null,
                            "_source": {
                                "quantity": 6
                            },
                            "sort": [
                                1532476800000,
                                6
                            ]
                        }
                    ]
                }
            }
        },
        {
            "key": "02",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "pLUFD2UBHBuSGcoH0ZT4",
                            "_score": null,
                            "_source": {
                                "quantity": 11
                            },
                            "sort": [
                                1532476800000,
                                11
                            ]
                        }
                    ]
                }
            }
        }
    ]
}

"latest_quantity": {
    "sum_bucket": {
        "buckets_path": "latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity"
    }
}

但这会导致以下错误：

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "No aggregation [hits] found for path [latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity]"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "inventory-local",
        "node": "3z5CqmmAQ-yT2sUCb69DzA",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "No aggregation [hits] found for path [latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity]"
        }
      }
    ]
  },
  "status": 400
}

从ElasticSearch中获取数字17的正确聚合是什么？

我对我拥有的另一个聚合做了类似的事情，一个平均值，而不是顶级点击率聚合

"average_quantity": {
    "sum_bucket": {
        "buckets_path": "average_quantity_per_store>average_quantity"
    }
},
"average_quantity_per_store": {
    "aggs": {
        "average_quantity": {
            "avg": {
                "field": "quantity"
            }
        }
    },
    "terms": {
        "field": "store",
        "size": 10000
    }
}

这与预期一样有效，结果如下：

"average_quantity_per_store": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
            "key": "01",
            "doc_count": 2,
            "average_quantity": {
                "value": 6
            }
        },
        {
            "key": "02",
            "doc_count": 2,
            "average_quantity": {
                "value": 11.5
            }
        }
    ]
},
"average_quantity": {
    "value": 17.5
}

有一种方法可以通过混合使用聚合和管道聚合来解决这个问题。脚本化的度量聚合有点复杂，但其主要思想是允许您提供自己的bucketing算法并从中吐出单个度量值

在您的情况下，您要做的是计算每个门店的最新数量，然后将这些门店数量相加。解决方案如下所示，我将在下面解释一些细节：

POST inventory-local/_search
{
  "size": 0,
  "aggs": {
    "bystore": {
      "terms": {
        "field": "store.keyword",
        "size": 10000
      },
      "aggs": {
        "latest_quantity": {
          "scripted_metric": {
            "init_script": "params._agg.quantities = new TreeMap()",
            "map_script": "params._agg.quantities.put(doc.datetime.date, [doc.datetime.date.millis, doc.quantity.value])",
            "combine_script": "return params._agg.quantities.lastEntry().getValue()",
            "reduce_script": "def maxkey = 0; def qty = 0; for (a in params._aggs) {def currentKey = a[0]; if (currentKey > maxkey) {maxkey = currentKey; qty = a[1]} } return qty;"
          }
        }
      }
    },
    "sum_latest_quantities": {
      "sum_bucket": {
        "buckets_path": "bystore>latest_quantity.value"
      }
    }
  }
}

请注意，为了使其正常工作，您需要在

elasticsearch.yml

配置文件中设置

script.painless.regex.enabled:true

init_脚本

为每个碎片创建一个

TreeMap

。

map\u脚本用日期/数量的映射填充每个碎片上的TreeMap
。我们在映射中输入的值包含单个字符串中的时间戳和数量。稍后在reduce\u脚本中需要该时间戳。
combine\u脚本
只取TreeMap的最后一个值，因为这是给定碎片的最新数量。
大部分工作位于reduce\u脚本中。我们迭代每个碎片的所有最新数量，并返回最新数量
现在，我们有每家商店的最新数量。剩下要做的就是使用一个sum_bucket
管道聚合来对每个存储数量进行求和。这就是17的结果
响应如下所示：
 "aggregations": {
    "bystore": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "01",
          "doc_count": 2,
          "latest_quantity": {
            "value": 6
          }
        },
        {
          "key": "02",
          "doc_count": 2,
          "latest_quantity": {
            "value": 11
          }
        }
      ]
    },
    "sum_latest_quantities": {
      "value": 17
    }
  }

谢谢你知道我如何在AWS托管的elasticsearch域上更改elasticsearch.yml
文件吗？嗯，好问题。我不确定你真的可以。我知道您可以设置一些白名单设置，但不确定AWS是否允许更改此设置。你应该考虑迁移到替代（由AWS支持，但更灵活）感谢链接！同时，我在AWS论坛上也提出了同样的问题：答案是：我们可能不需要使用regexp来拆分字符串，问题是。我马上报告……我已经更新了答案，只需使用数组而不是字符串。不再需要分裂了。