Scala 使用spark检索聚合/存储桶
示例输出:Scala 使用spark检索聚合/存储桶,scala,apache-spark,
elasticsearch,Scala,Apache Spark,
elasticsearch,示例输出: GET test_data/_search { "query": {"bool": {"must": [ {"match": {"company":"foo"}} ] }}, "size": 0, "aggs" : { "filenames": { "terms":{ "field": "filename.keyword" }, "aggs": {
GET test_data/_search
{
"query": {"bool": {"must": [
{"match": {"company":"foo"}}
]
}},
"size": 0,
"aggs" : {
"filenames": {
"terms":{
"field": "filename.keyword"
},
"aggs": {
"maxDate": {"max": {"field":"timestamp"}},
"minDate": {"min": {"field":"timestamp"}}
}
}
}
}
当在kibana开发工具中输入时,此查询返回所需的结果
当我尝试使用spark elasticsearch返回结果时
{
"took": 1052,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 52120825,
"max_score": 0,
"hits": []
},
"aggregations": {
"filenames": {
"doc_count_error_upper_bound": 97326,
"sum_other_doc_count": 51389890,
"buckets": [
{
"key": "Messages_20170711_080003.mes",
"doc_count": 187131,
"minDate": {
"value": 1499724098000,
"value_as_string": "2017-07-10T22:01:38.000Z"
},
"maxDate": {
"value": 1499760002000,
"value_as_string": "2017-07-11T08:00:02.000Z"
}
},
{
"key": "Messages_20170213_043108.mes",
"doc_count": 115243,
"minDate": {
"value": 1486735453000,
"value_as_string": "2017-02-10T14:04:13.000Z"
},
"maxDate": {
"value": 1486960265000,
"value_as_string": "2017-02-13T04:31:05.000Z"
}
},
dataframe向我显示了所有点击,而不是包含聚合的bucket。
如何将聚合/存储桶提供的结果存储在数据帧中?一个解决方法是通过spark执行聚合
val df = spark.sqlContext.esDF(esInputIndexName, query = queryString)
df.show(10, false)
一种解决方法是通过spark执行聚合
val df = spark.sqlContext.esDF(esInputIndexName, query = queryString)
df.show(10, false)
谢谢你的回答,到目前为止我都是这么做的。运行时间是个麻烦(20分钟以上)。如果在查询中进行数据过滤,这会更快。谢谢你的回答,这和我之前做的差不多。运行时间是个麻烦(20分钟以上)。如果在查询中进行数据过滤,这将大大加快工作速度。