Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/lua/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 使用spark检索聚合/存储桶_Scala_Apache Spark_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch - Fatal编程技术网 elasticsearch,Scala,Apache Spark,elasticsearch" /> elasticsearch,Scala,Apache Spark,elasticsearch" />

Scala 使用spark检索聚合/存储桶

Scala 使用spark检索聚合/存储桶,scala,apache-spark,elasticsearch,Scala,Apache Spark,elasticsearch,示例输出: GET test_data/_search { "query": {"bool": {"must": [ {"match": {"company":"foo"}} ] }}, "size": 0, "aggs" : { "filenames": { "terms":{ "field": "filename.keyword" }, "aggs": {

示例输出:

GET test_data/_search
{
  "query": {"bool": {"must": [
    {"match": {"company":"foo"}}
  ] 
  }},

  "size": 0, 
    "aggs" : {
       "filenames": {
         "terms":{
           "field": "filename.keyword"
         },
        "aggs": {
         "maxDate": {"max": {"field":"timestamp"}},
         "minDate": {"min": {"field":"timestamp"}}
       }
      }     
    }
} 
当在kibana开发工具中输入时,此查询返回所需的结果

当我尝试使用spark elasticsearch返回结果时

{
  "took": 1052,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 52120825,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "filenames": {
      "doc_count_error_upper_bound": 97326,
      "sum_other_doc_count": 51389890,
      "buckets": [
        {
          "key": "Messages_20170711_080003.mes",
          "doc_count": 187131,
          "minDate": {
            "value": 1499724098000,
            "value_as_string": "2017-07-10T22:01:38.000Z"
          },
          "maxDate": {
            "value": 1499760002000,
            "value_as_string": "2017-07-11T08:00:02.000Z"
          }
        },
        {
          "key": "Messages_20170213_043108.mes",
          "doc_count": 115243,
          "minDate": {
            "value": 1486735453000,
            "value_as_string": "2017-02-10T14:04:13.000Z"
          },
          "maxDate": {
            "value": 1486960265000,
            "value_as_string": "2017-02-13T04:31:05.000Z"
          }
        },
dataframe向我显示了所有点击,而不是包含聚合的bucket。
如何将聚合/存储桶提供的结果存储在数据帧中?

一个解决方法是通过spark执行聚合

val df = spark.sqlContext.esDF(esInputIndexName, query = queryString)
df.show(10, false)

一种解决方法是通过spark执行聚合

val df = spark.sqlContext.esDF(esInputIndexName, query = queryString)
df.show(10, false)

谢谢你的回答,到目前为止我都是这么做的。运行时间是个麻烦(20分钟以上)。如果在查询中进行数据过滤,这会更快。谢谢你的回答,这和我之前做的差不多。运行时间是个麻烦(20分钟以上)。如果在查询中进行数据过滤,这将大大加快工作速度。