Apache spark Pyspark-在嵌套数组中强制转换列
我有一个具有以下模式的数据帧:Apache spark Pyspark-在嵌套数组中强制转换列,apache-spark,pyspark,casting,apache-spark-sql,pyspark-dataframes,Apache Spark,Pyspark,Casting,Apache Spark Sql,Pyspark Dataframes,我有一个具有以下模式的数据帧: root |-- Id: long (nullable = true) |-- LastUpdate: string (nullable = true) |-- Info: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- Purchase: array (nullable = true) | | | |-- ele
root
|-- Id: long (nullable = true)
|-- LastUpdate: string (nullable = true)
|-- Info: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- Purchase: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- Amount: long (nullable = true)
| | | | |-- Name: string (nullable = true)
| | | | |-- Type: string (nullable = true)
如何选择Amount
列,以便对其进行强制转换
Tried:
df = df.withColumn("Info.Purchase.Amount", df["Info.Purchase.Amount"].cast(DoubleType()))
But got:
org.apache.spark.sql.AnalysisException: cannot resolve '`Info`.`Purchase`['Amount']'
您可以使用以下方法提取嵌套数组:
df.select(col("info").getField("Purchase").getField("Amount")).show()
这将为您提供所有金额列的列表。你可以投那个