如何更新pyspark中的列值?

如何更新pyspark中的列值?,pyspark,apache-spark-sql,pyspark-dataframes,Pyspark,Apache Spark Sql,Pyspark Dataframes,在我的场景中,my_url列可以位于第一级,也可以位于嵌套列的内部 如何递归更改列的值。嵌套列可以是StructType或ArrayType和my_url 它可以是第二级 root |-- _id: struct (nullable = true) | |-- oid: string (nullable = true) |-- websites: struct (nullable = true) | |-- cb_url: string (nullable = true)

在我的场景中,my_url列可以位于第一级,也可以位于嵌套列的内部 如何递归更改列的值。嵌套列可以是StructType或ArrayType和my_url 它可以是第二级

root
 |-- _id: struct (nullable = true)
 |    |-- oid: string (nullable = true)
 |-- websites: struct (nullable = true)
 |    |-- cb_url: string (nullable = true)
 |    |-- domain_url: string (nullable = true)
 |    |-- email: string (nullable = true)
 |    |-- facebook_url: string (nullable = true)
 |    |-- homepage_url: string (nullable = true)
 |    |-- linkedin_url: string (nullable = true)
 |    |-- my_url: string (nullable = true)
 |    |-- phone: string (nullable = true)
 |    |-- twitter_url: string (nullable = true)
或者它可以是第一级:

root
 |-- _id: struct (nullable = true)
 |    |-- oid: string (nullable = true)
 |-- my_url: string (nullable = true)
 |-- facebook_url: string (nullable = true)
或者像下面这样

root
 |-- _id: struct (nullable = true)
 |    |-- oid: string (nullable = true)
 |-- investments: struct (nullable = true)
 |    |-- investment_list: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- funding_round_info: struct (nullable = true)
 |    |    |    |    |-- announced_on: timestamp (nullable = true)
 |    |    |    |    |-- my_url: string (nullable = true)
它可以是任何级别。