Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala spark2.0数据帧将多行按列收集为数组_Scala_Apache Spark - Fatal编程技术网

Scala spark2.0数据帧将多行按列收集为数组

Scala spark2.0数据帧将多行按列收集为数组,scala,apache-spark,Scala,Apache Spark,我有一些数据帧,如下所示,如果列值相同,我希望将muilt行转换为数组 val data = Seq(("a","b","sum",0),("a","b","avg",2)).toDF("id1","id2","type","value2").show +---+---+----+------+ |id1|id2|type|value2| +---+---+----+------+ | a| b| sum| 0| | a| b| avg|

我有一些数据帧,如下所示,如果列值相同,我希望将muilt行转换为数组

val data = Seq(("a","b","sum",0),("a","b","avg",2)).toDF("id1","id2","type","value2").show
    +---+---+----+------+
    |id1|id2|type|value2|
    +---+---+----+------+
    |  a|  b| sum|     0|
    |  a|  b| avg|     2|
    +---+---+----+------+
我想把它转换到下面

+---+---+----+------+
|id1|id2|agg |value2|
+---+---+----+------+
|  a|  b| 0,2|     0|
+---+---+----+------+
printSchema应该如下所示

root
 |-- id1: string (nullable = true)
 |-- id2: string (nullable = true)
 |-- agg: struct (nullable = true)
 |    |-- sum: int (nullable = true)
 |    |-- dc: int (nullable = true)
你可以:

import org.apache.spark.sql.functions._

val data = Seq(
  ("a","b","sum",0),("a","b","avg",2)
).toDF("id1","id2","type","value2")

val result = data.groupBy($"id1", $"id2").agg(struct(
  first(when($"type" === "sum", $"value2"), true).alias("sum"), 
  first(when($"type" === "avg", $"value2"), true).alias("avg")
).alias("agg"))

result.show

+---+---+-----+   
|id1|id2|  agg|
+---+---+-----+
|  a|  b|[0,2]|
+---+---+-----+

result.printSchema
root
 |-- id1: string (nullable = true)
 |-- id2: string (nullable = true)
 |-- agg: struct (nullable = false)
 |    |-- sum: integer (nullable = true)
 |    |-- avg: integer (nullable = true)

第二个表中的值2是什么?我的意思是2的值是0