Apache spark 将json数据值嵌套到DataFrame

Apache spark 将json数据值嵌套到DataFrame,apache-spark,pyspark,apache-spark-sql,Apache Spark,Pyspark,Apache Spark Sql,将嵌套的json行值(json)转换为新的数据帧 val rd1= spark.read.option("multiLine", "true").option("mode", "PERMISSIVE").json("data.json") import org.apache.spark.sql.functions._ val ds1= rd1.select("alpha._id", "alpha.Description", "alpha.Sub-Tower","alpha.Tower","a

将嵌套的json行值(json)转换为新的数据帧

val rd1= spark.read.option("multiLine", "true").option("mode", "PERMISSIVE").json("data.json")

import org.apache.spark.sql.functions._

val ds1= rd1.select("alpha._id", "alpha.Description", "alpha.Sub-Tower","alpha.Tower","alpha.input_data") // 

ds1.show()// it gives only single row with array in each column values  instead need table of 4 rows
我的方法1 输入 预期产出:

alpha表如下所示:

+-----------+---------+-----+---+----------------+
|Description|Sub-Tower|Tower|_id|      input_data|
+-----------+---------+-----+---+----------------+
|      a b,c|      crt|A B C| 27|alpha beta gamma|
|      a b,c|      crt|A B C| 91|alpha beta gamma|
|      a b,c|      crt|A B C| 21|alpha beta gamma|
|      a b,c|      crt|A B C| 29|alpha beta gamma|
+-----------+---------+-----+---+----------------+

下面是scala中的代码,用于分解列
alpha

val df = <read_your_input_file_using_spark>

import org.apache.spark.sql.functions._
import sparkSession.sqlContext.implicits._

val result = df.select(explode($"alpha").as("alpha")).select("alpha.*")

result.printSchema()
result.show()
+-----------+---------+-----+---+----------------+
|Description|Sub-Tower|Tower|_id|      input_data|
+-----------+---------+-----+---+----------------+
|      a b,c|      crt|A B C| 27|alpha beta gamma|
|      a b,c|      crt|A B C| 91|alpha beta gamma|
|      a b,c|      crt|A B C| 21|alpha beta gamma|
|      a b,c|      crt|A B C| 29|alpha beta gamma|
+-----------+---------+-----+---+----------------+
val df = <read_your_input_file_using_spark>

import org.apache.spark.sql.functions._
import sparkSession.sqlContext.implicits._

val result = df.select(explode($"alpha").as("alpha")).select("alpha.*")

result.printSchema()
result.show()
root
 |-- Description: string (nullable = true)
 |-- Sub-Tower: string (nullable = true)
 |-- Tower: string (nullable = true)
 |-- _id: string (nullable = true)
 |-- input_data: string (nullable = true)

+-----------+---------+-----+---+----------------+
|Description|Sub-Tower|Tower|_id|      input_data|
+-----------+---------+-----+---+----------------+
|      a b,c|      crt|A B C| 27|alpha beta gamma|
|      a b,c|      crt|A B C| 91|alpha beta gamma|
|      a b,c|      crt|A B C| 21|alpha beta gamma|
|      a b,c|      crt|A B C| 29|alpha beta gamma|
+-----------+---------+-----+---+----------------+