Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/apache-kafka/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
scala spark:将范围内的列表收集到一行中_Scala_Apache Spark 2.0 - Fatal编程技术网

scala spark:将范围内的列表收集到一行中

scala spark:将范围内的列表收集到一行中,scala,apache-spark-2.0,Scala,Apache Spark 2.0,假设我有一个如下所示的数据帧 +----+----------+----+----------------+ |colA| colB|colC| colD| +----+----------+----+----------------+ | 1|2020-03-24| 21|[0.0, 2.49, 3.1]| | 1|2020-03-17| 20|[1.0, 2.49, 3.1]| | 1|2020-03-10| 19|[2.0, 2.49, 3

假设我有一个如下所示的数据帧

+----+----------+----+----------------+
|colA|      colB|colC|            colD|
+----+----------+----+----------------+
|   1|2020-03-24|  21|[0.0, 2.49, 3.1]|
|   1|2020-03-17|  20|[1.0, 2.49, 3.1]|
|   1|2020-03-10|  19|[2.0, 2.49, 3.1]|
|   2|2020-03-24|  21|[0.0, 2.49, 3.1]|
|   2|2020-03-17|  20|[1.0, 2.49, 3.1]|
+----+----------+----+----------------+
我想把感冒收集成一行,那也只有在收集范围内的列表

Output
+----+----------+----+----------------+------------------------------------------------------+
|colA|colB      |colC|colD            |colE                                                  |
+----+----------+----+----------------+------------------------------------------------------+
|1   |2020-03-24|21  |[0.0, 2.49, 3.1]|[[0.0, 2.49, 3.1], [1.0, 2.49, 3.1]]                  |
|1   |2020-03-17|20  |[1.0, 2.49, 3.1]|[[1.0, 2.49, 3.1], [2.0, 2.49, 3.1]]
|1   |2020-03-10|19  |[2.0, 2.49, 3.1]|[[2.0, 2.49, 3.1]]                  |
|2   |2020-03-24|21  |[0.0, 2.49, 3.1]|[[0.0, 2.49, 3.1], [1.0, 2.49, 3.1]]                  |
|2   |2020-03-17|20  |[1.0, 2.49, 3.1]|[[1.0, 2.49, 3.1]]                  |
+----+----------+----+----------------+------------------------------------------------------+

我尝试了以下操作,但出现了错误:

cannot resolve 'RANGE BETWEEN CAST((`colC` - 2) AS STRING) FOLLOWING AND CAST(`colC` AS STRING) FOLLOWING' due to data type mismatch: Window frame lower bound 'cast((colC#575 - 2) as string)' is not a literal.;;
val data = Seq(("1", "2020-03-24", 21,  List(0.0, 2.49,3.1)), ("1", "2020-03-17", 20,  List(1.0, 2.49,3.1)), ("1", "2020-03-10", 19,  List(2.0, 2.49,3.1)), ("2", "2020-03-24", 21,  List(0.0, 2.49,3.1)), 
                ("2", "2020-03-17", 20,  List(1.0, 2.49,3.1))
            )
val rdd = spark.sparkContext.parallelize(data)
val df = rdd.toDF("colA","colB", "colC", "colD")
df.show()

val df1 =   df
            .withColumn("colE", collect_list("colD").over(Window.partitionBy("colA")
                                  .orderBy("colB").rangeBetween($"colC" - lit(2), $"colC")))
            .show(false)