Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/74.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark pyspark:基于嵌套键的联接表_Apache Spark_Pyspark_Spark Dataframe_Pyspark Sql - Fatal编程技术网

Apache spark pyspark:基于嵌套键的联接表

Apache spark pyspark:基于嵌套键的联接表,apache-spark,pyspark,spark-dataframe,pyspark-sql,Apache Spark,Pyspark,Spark Dataframe,Pyspark Sql,我有两个带有以下示例模式的表。表A的键嵌套在表B的列表中。我想基于表A键连接表A和表B以生成表C。表A中的值应该是基于表B中的键列表的表C中的嵌套结构。我如何使用pyspark做到这一点?谢谢 表A root |-- item1: string (nullable = true) |-- item2: long (nullable = true) |-- keyA: string (nullable = true) 表B root |-- item1: string (nullabl

我有两个带有以下示例模式的表。表A的键嵌套在表B的列表中。我想基于表A键连接表A和表B以生成表C。表A中的值应该是基于表B中的键列表的表C中的嵌套结构。我如何使用pyspark做到这一点?谢谢

表A

root 
|-- item1: string (nullable = true) 
|-- item2: long (nullable = true) 
|-- keyA: string (nullable = true) 
表B

root 
|-- item1: string (nullable = true) 
|-- item2: long (nullable = true) 
|-- keyB: string (nullable = true) 
|-- keyAs: array (nullable = true) 
| |-- element: string (containsNull = true)
表C

root 
|-- item1: string (nullable = true) 
|-- item2: long (nullable = true) 
|-- keyB: string (nullable = true) 
|-- keyAs: array (nullable = true) 
| |-- element: string (containsNull = true) 
|-- valueAs: array (nullable = true) 
| |-- element: struct (containsNull = true) 
| | |-- item1: string (nullable = true) 
| | |-- item2: long (nullable = true) 
| | |-- keyA: string (nullable = true)

要连接A和B,需要先分解
B.keyAs
,如下所示:

tableB.withColumn('keyA', explode('keyAs')).join(tableA, 'keyA')
有关创建嵌套结构的信息,请参见