Scala Pivot spark多级数据集_Scala_Apache Spark

Scala Pivot spark多级数据集

scala apache-spark

Scala Pivot spark多级数据集,scala,apache-spark,Scala,Apache Spark,我将数据集与以下模式结合在一起： root |-- from: struct (nullable = false) | |-- id: string (nullable = true) | |-- name: string (nullable = true) | |-- tags: string (nullable = true) |-- v1: struct (nullable = false) | |-- id: string (nullable = tr

我将

数据集

与以下模式结合在一起：

root
 |-- from: struct (nullable = false)
 |    |-- id: string (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- tags: string (nullable = true)
 |-- v1: struct (nullable = false)
 |    |-- id: string (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- tags: string (nullable = true)
 |-- v2: struct (nullable = false)
 |    |-- id: string (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- tags: string (nullable = true)
 |-- v3: struct (nullable = false)
 |    |-- id: string (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- tags: string (nullable = true)
 |-- to: struct (nullable = false)
 |    |-- id: string (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- tags: string (nullable = true)

如何在Scala上从此数据集中创建表（仅包含3列id、名称和标记）？

只需将所有列组合成一个

数组，分解并选择所有嵌套字段：
import org.apache.spark.sql.functions.{array, col, explode}

case class Vertex(id: String, name: String, tags: String)

val df  = Seq(((
  Vertex("1", "from", "a"), Vertex("2", "V1", "b"), Vertex("3", "V2", "c"), 
  Vertex("4", "v3", "d"), Vertex("5", "to", "e")
)).toDF("from", "v1", "v2", "v3", "to")


df.select(explode(array(df.columns map col: _*)).alias("col")).select("col.*")

结果如下：
+---+----+----+
|id | name |标签|
+---+----+----+
|1 |自| a|
|2 | V1 | b|
|3 | V2 | c|
|4 | v3 | d|
|5 |至| e|
+---+----+----+
只需将所有列组合成一个数组
，分解
，然后选择所有嵌套字段：
import org.apache.spark.sql.functions.{array, col, explode}

case class Vertex(id: String, name: String, tags: String)

val df  = Seq(((
  Vertex("1", "from", "a"), Vertex("2", "V1", "b"), Vertex("3", "V2", "c"), 
  Vertex("4", "v3", "d"), Vertex("5", "to", "e")
)).toDF("from", "v1", "v2", "v3", "to")


df.select(explode(array(df.columns map col: _*)).alias("col")).select("col.*")

结果如下：
+---+----+----+
|id | name |标签|
+---+----+----+
|1 |自| a|
|2 | V1 | b|
|3 | V2 | c|
|4 | v3 | d|
|5 |至| e|
+---+----+----+