Scala 如何将StructType从Spark中的json数据帧分解为行而不是列_Scala_Apache Spark_Apache Spark Sql

Scala 如何将StructType从Spark中的json数据帧分解为行而不是列

scala apache-spark

Scala 如何将StructType从Spark中的json数据帧分解为行而不是列,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我阅读了一个带有此模式的嵌套json： root |-- company: struct (nullable = true) | |-- 0: string (nullable = true) | |-- 1: string (nullable = true) | |-- 10: string (nullable = true) | |-- 100: string (nullable = true) | |-- 101: string (nullabl

我阅读了一个带有此模式的嵌套json：

 root
 |-- company: struct (nullable = true)
 |    |-- 0: string (nullable = true)
 |    |-- 1: string (nullable = true)
 |    |-- 10: string (nullable = true)
 |    |-- 100: string (nullable = true)
 |    |-- 101: string (nullable = true)
 |    |-- 102: string (nullable = true)
 |    |-- 103: string (nullable = true)
 |    |-- 104: string (nullable = true)
 |    |-- 105: string (nullable = true)
 |    |-- 106: string (nullable = true)
 |    |-- 107: string (nullable = true)
 |    |-- 108: string (nullable = true)
 |    |-- 109: string (nullable = true)

当我尝试：

df.select(col("company.*"))

我得到结构“company”的每个字段作为列。但我希望它们成排。我想在另一列中获得一行id和字符串：

  0        1         10       100      101        102 
"hey"   "yooyo"    "yuyu"    "hey"   "yooyo"    "yuyu"

而是得到类似于：

id    name
0     "hey"
1     "yoooyo"
10    "yuuy"
100   "hey"
101   "yooyo"
102    "yuyu"

提前感谢您的帮助

棘手的

尝试使用union：

val dfExpl = df.select("company.*")

dfExpl.columns
.map(name => dfExpl.select(lit(name),col(name)))
  .reduce(_ union _)
  .show

或者使用阵列/分解：

val dfExpl = df.select("company.*")
val selectExpr = dfExpl
  .columns
  .map(name =>
    struct(
      lit(name).as("id"),
      col(name).as("value")
    ).as("col")
  )


dfExpl
  .select(
    explode(array(selectExpr: _*))
  )
  .select("col.*")
  .show()

不知道其他的，但是对于我的用例来说，您的第二个解决方案确实更快。感谢这两个解决方案，是因为explode针对此类操作进行了优化吗？请确保

导入org.apache.spark.sql.functions.{explode，lit，struct，array，col}

是否有此答案的pyspark版本？