Scala sparksql数据帧

Scala sparksql数据帧,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,数据结构: {"Emp":{"Name":"John", "Sal":"2000", "Address":[{"loc":"Sanjose","Zip":"222"},{"loc":"dayton","Zip":"333"}]}} 现在,我想将数据加载到一个数据帧中,并想将zip附加到loc。loc列名称应相同(loc)。转换后的数据应如下所示: {"Emp":{"Name":"John", "Sal":"2000", "Address":[{"loc":"Sanjose222","Zip":

数据结构:

{"Emp":{"Name":"John", "Sal":"2000", "Address":[{"loc":"Sanjose","Zip":"222"},{"loc":"dayton","Zip":"333"}]}}
现在,我想将数据加载到一个数据帧中,并想将zip附加到loc。loc列名称应相同(loc)。转换后的数据应如下所示:

{"Emp":{"Name":"John", "Sal":"2000", "Address":[{"loc":"Sanjose222","Zip":"222"},{"loc":"dayton333","Zip":"333"}]}}

没有RDD。我需要一个数据帧操作来实现这一点,最好使用
withColumn
功能。如何执行此操作?

给定如下数据结构

val jsonString = """{"Emp":{"Name":"John","Sal":"2000","Address":[{"loc":"Sanjose","Zip":"222"},{"loc":"dayton","Zip":"333"}]}}"""
您可以根据需要将其转换为数据帧

val df = spark.read.json(sc.parallelize(jsonString::Nil))
那会给你什么

+-----------------------------------------------------+
|Emp                                                  |
+-----------------------------------------------------+
|[WrappedArray([222,Sanjose], [333,dayton]),John,2000]|
+-----------------------------------------------------+

//root
// |-- Emp: struct (nullable = true)
// |    |-- Address: array (nullable = true)
// |    |    |-- element: struct (containsNull = true)
// |    |    |    |-- Zip: string (nullable = true)
// |    |    |    |-- loc: string (nullable = true)
// |    |-- Name: string (nullable = true)
// |    |-- Sal: string (nullable = true)
+-----------------------------------------------------------+
|Emp                                                        |
+-----------------------------------------------------------+
|[John,2000,WrappedArray([Sanjose222,222], [dayton333,333])]|
+-----------------------------------------------------------+

//root
// |-- Emp: struct (nullable = false)
// |    |-- Name: string (nullable = true)
// |    |-- Sal: string (nullable = true)
// |    |-- Address: array (nullable = true)
// |    |    |-- element: struct (containsNull = true)
// |    |    |    |-- loc: string (nullable = true)
// |    |    |    |-- Zip: string (nullable = true)
现在,要获得所需的输出,您需要将struct Emp column分隔为单独的列,并在udf函数中使用Address array column来获得所需的结果

其中
udf
类中的
address
是一个
案例类

case class address(loc: String, Zip: String)
应该给你什么

+-----------------------------------------------------+
|Emp                                                  |
+-----------------------------------------------------+
|[WrappedArray([222,Sanjose], [333,dayton]),John,2000]|
+-----------------------------------------------------+

//root
// |-- Emp: struct (nullable = true)
// |    |-- Address: array (nullable = true)
// |    |    |-- element: struct (containsNull = true)
// |    |    |    |-- Zip: string (nullable = true)
// |    |    |    |-- loc: string (nullable = true)
// |    |-- Name: string (nullable = true)
// |    |-- Sal: string (nullable = true)
+-----------------------------------------------------------+
|Emp                                                        |
+-----------------------------------------------------------+
|[John,2000,WrappedArray([Sanjose222,222], [dayton333,333])]|
+-----------------------------------------------------------+

//root
// |-- Emp: struct (nullable = false)
// |    |-- Name: string (nullable = true)
// |    |-- Sal: string (nullable = true)
// |    |-- Address: array (nullable = true)
// |    |    |-- element: struct (containsNull = true)
// |    |    |    |-- loc: string (nullable = true)
// |    |    |    |-- Zip: string (nullable = true)
现在,要获取json,只需使用
.toJSON
,您应该

+-----------------------------------------------------------------------------------------------------------------+
|value                                                                                                            |
+-----------------------------------------------------------------------------------------------------------------+
|{"Emp":{"Name":"John","Sal":"2000","Address":[{"loc":"Sanjose222","Zip":"222"},{"loc":"dayton333","Zip":"333"}]}}|
+-----------------------------------------------------------------------------------------------------------------+

请张贴您尝试过的内容