Scala 如何将StructType从Spark中的json数据帧分解为行而不是列
我阅读了一个带有此模式的嵌套json:Scala 如何将StructType从Spark中的json数据帧分解为行而不是列,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我阅读了一个带有此模式的嵌套json: root |-- company: struct (nullable = true) | |-- 0: string (nullable = true) | |-- 1: string (nullable = true) | |-- 10: string (nullable = true) | |-- 100: string (nullable = true) | |-- 101: string (nullabl
root
|-- company: struct (nullable = true)
| |-- 0: string (nullable = true)
| |-- 1: string (nullable = true)
| |-- 10: string (nullable = true)
| |-- 100: string (nullable = true)
| |-- 101: string (nullable = true)
| |-- 102: string (nullable = true)
| |-- 103: string (nullable = true)
| |-- 104: string (nullable = true)
| |-- 105: string (nullable = true)
| |-- 106: string (nullable = true)
| |-- 107: string (nullable = true)
| |-- 108: string (nullable = true)
| |-- 109: string (nullable = true)
当我尝试:
df.select(col("company.*"))
我得到结构“company”的每个字段作为列。但我希望它们成排。我想在另一列中获得一行id和字符串:
0 1 10 100 101 102
"hey" "yooyo" "yuyu" "hey" "yooyo" "yuyu"
而是得到类似于:
id name
0 "hey"
1 "yoooyo"
10 "yuuy"
100 "hey"
101 "yooyo"
102 "yuyu"
提前感谢您的帮助
棘手的尝试使用union:
val dfExpl = df.select("company.*")
dfExpl.columns
.map(name => dfExpl.select(lit(name),col(name)))
.reduce(_ union _)
.show
或者使用阵列/分解:
val dfExpl = df.select("company.*")
val selectExpr = dfExpl
.columns
.map(name =>
struct(
lit(name).as("id"),
col(name).as("value")
).as("col")
)
dfExpl
.select(
explode(array(selectExpr: _*))
)
.select("col.*")
.show()
不知道其他的,但是对于我的用例来说,您的第二个解决方案确实更快。感谢这两个解决方案,是因为explode针对此类操作进行了优化吗?请确保
导入org.apache.spark.sql.functions.{explode,lit,struct,array,col}
是否有此答案的pyspark版本?