Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 如何在Spark 2.3中从_数组()映射__Apache Spark - Fatal编程技术网

Apache spark 如何在Spark 2.3中从_数组()映射_

Apache spark 如何在Spark 2.3中从_数组()映射_,apache-spark,Apache Spark,Spark 2.3中的以下数据帧来自一个JSON文件: root |-- ext_attr: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- key: string (nullable = true) | | |-- value: string (nullable = true) 我需要将其转换为以下数据帧: root |-- ext_attr_map

Spark 2.3中的以下数据帧来自一个JSON文件:

root
 |-- ext_attr: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- key: string (nullable = true)
 |    |    |-- value: string (nullable = true)
我需要将其转换为以下数据帧:

root
 |-- ext_attr_map: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: string (nullable = true)
我在Spark 2.4中看到,可能来自数组()的映射会这样做

如何在Spark 2.3中实现这一点?请提供自定义项或SQL代码。

scala>val df1=Seq(Seq((“1”、“a1”)、(“2”、“a2”)、Seq((“3”、“a3”)、(“4”、“a4”))。toDF()
scala> val df1 = Seq(Seq(("1","a1"),("2","a2")),Seq(("3","a3"),("4","a4"))).toDF()
df1: org.apache.spark.sql.DataFrame = [value: array<struct<_1:string,_2:string>>]

scala> df1.printSchema
root
 |-- value: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- _1: string (nullable = true)
 |    |    |-- _2: string (nullable = true)


scala> df1.select('*,monotonically_increasing_id.as("id")).select('id,explode('value)).select('id,map($"col._1",$"col._2").as("value")).groupBy('id).agg(collect_list('value).as("value")).select('value).printSchema
root
 |-- value: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)

    enter code here
df1:org.apache.spark.sql.DataFrame=[value:array] scala>df1.printSchema 根 |--值:数组(nullable=true) ||--元素:struct(containsnall=true) || |--_1:string(nullable=true) || |--_2:string(nullable=true) scala>df1.select('*,单调递增的_id.as(“id”)).select('id,explode('value)).select('id,map($“col.'u 1”,“$”col.'u 2”).as(“value”).groupBy('id”).agg(collect_list('value.).as(“value”).select('value.).printSchema 根 |--值:数组(nullable=true) ||--元素:map(containsnall=true) || |--键:字符串 || |--value:string(valuecontainsnall=true) 在这里输入代码