Apache spark 如何在Spark 2.3中从_数组（）映射__Apache Spark

Apache spark 如何在Spark 2.3中从_数组（）映射_

apache-spark

Apache spark 如何在Spark 2.3中从_数组（）映射_,apache-spark,Apache Spark,Spark 2.3中的以下数据帧来自一个JSON文件： root |-- ext_attr: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- key: string (nullable = true) | | |-- value: string (nullable = true) 我需要将其转换为以下数据帧： root |-- ext_attr_map

Spark 2.3中的以下数据帧来自一个JSON文件：

root
 |-- ext_attr: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- key: string (nullable = true)
 |    |    |-- value: string (nullable = true)

我需要将其转换为以下数据帧：

root
 |-- ext_attr_map: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: string (nullable = true)

我在Spark 2.4中看到，可能来自数组（）的映射会这样做

如何在Spark 2.3中实现这一点？请提供自定义项或SQL代码。

scala>val df1=Seq（Seq（（“1”、“a1”）、（“2”、“a2”）、Seq（（“3”、“a3”）、（“4”、“a4”））。toDF（）
scala> val df1 = Seq(Seq(("1","a1"),("2","a2")),Seq(("3","a3"),("4","a4"))).toDF()
df1: org.apache.spark.sql.DataFrame = [value: array<struct<_1:string,_2:string>>]

scala> df1.printSchema
root
 |-- value: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- _1: string (nullable = true)
 |    |    |-- _2: string (nullable = true)


scala> df1.select('*,monotonically_increasing_id.as("id")).select('id,explode('value)).select('id,map($"col._1",$"col._2").as("value")).groupBy('id).agg(collect_list('value).as("value")).select('value).printSchema
root
 |-- value: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)

    enter code here

df1:org.apache.spark.sql.DataFrame=[value:array]
scala>df1.printSchema
根
|--值：数组（nullable=true）
||--元素：struct（containsnall=true）
|| |--_1:string（nullable=true）
|| |--_2:string（nullable=true）
scala>df1.select（'*，单调递增的_id.as（“id”））.select（'id，explode（'value））.select（'id，map（$“col.'u 1”，“$”col.'u 2”）.as（“value”）.groupBy（'id”）.agg（collect_list（'value.）.as（“value”）.select（'value.）.printSchema
根
|--值：数组（nullable=true）
||--元素：map（containsnall=true）
|| |--键：字符串
|| |--value:string（valuecontainsnall=true）
在这里输入代码