Apache spark Pyspark:从另一个结构数组创建一个结构数组
我正在使用Pyspark 2.4,希望从Apache spark Pyspark:从另一个结构数组创建一个结构数组,apache-spark,pyspark,apache-spark-sql,pyspark-dataframes,Apache Spark,Pyspark,Apache Spark Sql,Pyspark Dataframes,我正在使用Pyspark 2.4,希望从df_1创建df_2: df_1: root |-- request: array (nullable = false) | |-- address: struct (nullable = false) | | |-- street: string (nullable = false) | | |-- postcode: string (nullable = false) root |-- request: a
df_1
创建df_2
:
df_1:
root
|-- request: array (nullable = false)
| |-- address: struct (nullable = false)
| | |-- street: string (nullable = false)
| | |-- postcode: string (nullable = false)
root
|-- request: array (nullable = false)
| |-- address: struct (nullable = false)
| | |-- street: string (nullable = false)
df_2:
root
|-- request: array (nullable = false)
| |-- address: struct (nullable = false)
| | |-- street: string (nullable = false)
| | |-- postcode: string (nullable = false)
root
|-- request: array (nullable = false)
| |-- address: struct (nullable = false)
| | |-- street: string (nullable = false)
我知道UDF是一种方法,但是有没有其他方法,比如使用map()
,来实现相同的目标?使用函数:
df_2 = df_1.withColumn("request", expr("transform(request, x -> struct(x.street) as address)"))
对于request
数组的每个元素,我们只选择street
字段并创建一个新结构