elasticsearch 如何在ElasticSearch中搜索和分组?
我有ElasticSearch索引,记录如下:
elasticsearch 如何在ElasticSearch中搜索和分组?,
elasticsearch,
elasticsearch,我有ElasticSearch索引,记录如下: { "project" : "A", "updated" : <date>, "cost" : 123 }, { "project" : "A", "updated" : <date>, "cost" : 1 }, { "project" : "B", "updated" : <date>, "cost" : 3 }, { "project" : "B",
{
"project" : "A",
"updated" : <date>,
"cost" : 123
},
{
"project" : "A",
"updated" : <date>,
"cost" : 1
},
{
"project" : "B",
"updated" : <date>,
"cost" : 3
},
{
"project" : "B",
"updated" : <date>,
"cost" : 4
},
{
"project" : "C",
"updated" : <date>,
"cost" : 5
}
不知道如何修改此查询,该查询将为一个项目提取数据:
"query": {
"bool": {
"must": [
{
"match": {
"project": {
"query": <project>,
"type": "phrase"
}
}
},
{
"range": {
"updated": {
"gte": <startDate>,
"format": "epoch_millis"
}
}
}
]
}
},
"aggs": {
"3": {
"date_histogram": {
"field": "End_Time",
"interval": "1M",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"2": {
"sum": {
"field": "cost"
}
}
}
}
}
但它返回了一些奇怪的结果:
"aggregations": {
"3": {
"buckets": [
{
"key_as_string": "2017-02-01T00:00:00.000-06:00",
"key": 1485928800000,
"doc_count": 17095,
"project_agg": {
"doc_count_error_upper_bound": 36,
"sum_other_doc_count": 3503,
"buckets": [
{
"2": {
"value": 2536.8616891294323
},
"key": 834879987748,
"doc_count": 2243
},
{
"2": {
"value": 3438.766646153458
},
"key": 497952557271,
"doc_count": 1785
},
{
"2": {
"value": 13066.367076588496
},
"key": 1057394416300,
"doc_count": 1736
},
...
这里每个月有10桶。我希望每个项目只看到2个值。出了什么问题?在汇总成本之前,您需要对项目进行汇总:
{
"aggs": {
"3": {
"date_histogram": {
"field": "End_Time",
"interval": "1M",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"2": {
"terms": {
"field": "project"
},
"aggs": {
"1": {
"sum": {
"field": "cost"
}
}
}
}
}
}
}
}
对于过滤,这取决于您希望如何进行搜索。有关可使用的项目列表,请执行以下操作:
"query": {
"bool": {
"must": [
{ "terms": { "project": [ "a", "b" ] } } //Assuming field is mapped as "analyzed"
]
}
}
如果映射包含.keyword变体,则将术语过滤器的格式设置为:
{“terms”:{“project.keyword”:[“A”,“B”]}}//假设字段映射为“未分析”或具有关键字字段。
以下是ES 5.5中如何将字段映射为带有“keword”字段的“文本”的示例:
在这种情况下,我可以使用“ShortTextContent”访问分析的版本,而使用“ShortTextContent.keyword”访问未分析的版本。您编写的查询提供了每月的总成本(与项目无关),您需要在
aggregation 3
和aggregation 2
之间进行另一个聚合以按项目分组
如果您只想为项目A
和B
,请在聚合中使用过滤器
"size": 0,
"aggs": {
"project": {
"filter": {
"bool": {
"must": [
{
"terms": {
"project": [
"a",
"b"
]
}
}
]
}
},
"aggs": {
"3": {
"date_histogram": {
"field": "End_Time",
"interval": "1M",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"project_agg": {
"terms": {
"field": "project"
},
"aggs": {
"2": {
"sum": {
"field": "cost"
}
}
}
}
}
}
}
}
}
非常感谢。但当我尝试使用“术语”时,我得到的结果是空的。但价值观是完全相同的。你知道为什么吗?看起来“应该”+“匹配”关键字正在工作。我使用的是5.0版本。您可以为您尝试报告的对象添加映射吗?如果术语查询给出的结果为零,则可能是由于字段映射为“已分析”(这意味着它通常会将文本小写)。例如,如果使用分析的映射为项目名称“A”和“B”编制索引,则可能会将它们存储为“A”和“B”,并且必须使用小写变量进行术语查询。在以后的弹性版本中,您可以访问未分析版本的“fieldname.keyword”字段,即“project.keyword”。您也可以在为数据编制索引之前在映射中指定此项。谢谢!我已经将“Project”改为“Project.raw”,现在它工作正常。谢谢!我试过了,但得到了奇怪的结果。你知道为什么吗?(我更新了帖子)我更新了aggs查询,只过滤项目“a”和“b”,这应该可以。我发现了。我在“项目”领域的问题不是“可聚合”,而是“已分析”。所以,agg没有抛出任何错误,但返回了一些奇怪的东西。我将“Project”改为“Project.raw”,但它工作正常。谢谢你的帮助!
"query": {
"bool": {
"must": [
{ "terms": { "project": [ "a", "b" ] } } //Assuming field is mapped as "analyzed"
]
}
}
"ShortTextContent" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
"size": 0,
"aggs": {
"project": {
"filter": {
"bool": {
"must": [
{
"terms": {
"project": [
"a",
"b"
]
}
}
]
}
},
"aggs": {
"3": {
"date_histogram": {
"field": "End_Time",
"interval": "1M",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"project_agg": {
"terms": {
"field": "project"
},
"aggs": {
"2": {
"sum": {
"field": "cost"
}
}
}
}
}
}
}
}
}