如何更新pyspark中的列值?
在我的场景中,my_url列可以位于第一级,也可以位于嵌套列的内部 如何递归更改列的值。嵌套列可以是StructType或ArrayType和my_url 它可以是第二级如何更新pyspark中的列值?,pyspark,apache-spark-sql,pyspark-dataframes,Pyspark,Apache Spark Sql,Pyspark Dataframes,在我的场景中,my_url列可以位于第一级,也可以位于嵌套列的内部 如何递归更改列的值。嵌套列可以是StructType或ArrayType和my_url 它可以是第二级 root |-- _id: struct (nullable = true) | |-- oid: string (nullable = true) |-- websites: struct (nullable = true) | |-- cb_url: string (nullable = true)
root
|-- _id: struct (nullable = true)
| |-- oid: string (nullable = true)
|-- websites: struct (nullable = true)
| |-- cb_url: string (nullable = true)
| |-- domain_url: string (nullable = true)
| |-- email: string (nullable = true)
| |-- facebook_url: string (nullable = true)
| |-- homepage_url: string (nullable = true)
| |-- linkedin_url: string (nullable = true)
| |-- my_url: string (nullable = true)
| |-- phone: string (nullable = true)
| |-- twitter_url: string (nullable = true)
或者它可以是第一级:
root
|-- _id: struct (nullable = true)
| |-- oid: string (nullable = true)
|-- my_url: string (nullable = true)
|-- facebook_url: string (nullable = true)
或者像下面这样
root
|-- _id: struct (nullable = true)
| |-- oid: string (nullable = true)
|-- investments: struct (nullable = true)
| |-- investment_list: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- funding_round_info: struct (nullable = true)
| | | | |-- announced_on: timestamp (nullable = true)
| | | | |-- my_url: string (nullable = true)
它可以是任何级别。