Scala 我的任务是更新spark dataframe,它的列类型为string和struct。我想在斯卡拉做些什么

Scala 我的任务是更新spark dataframe,它的列类型为string和struct。我想在斯卡拉做些什么,scala,apache-spark,Scala,Apache Spark,当idType等于“SELECTED”时,需要更改$identivers.status=“Done” 因此,预期的输出将是 Id Identifers '123' {"country":"PR", "idType":"SELECTED","status":"Not Done"} '234' {"country":"PR", "idType":"NOT SELECTED","status":"Not Done"} 我试着用 Id Identif

idType
等于
“SELECTED”时,需要更改
$identivers.status=“Done”

因此,预期的
输出将是

Id         Identifers
'123'      {"country":"PR", "idType":"SELECTED","status":"Not Done"}
'234'      {"country":"PR", "idType":"NOT SELECTED","status":"Not Done"}
我试着用

Id         Identifers
'123'      {"country":"PR", "idType":"SELECTED","status":"Not Done"}
'234'      {"country":"PR", "idType":"NOT SELECTED","status":"Done"}

但是,从Spark-2.2开始,这个失败给出了空值

对于这种情况,您可以使用内置函数中的from_json
(创建单个COL)和
(重新创建json对象)

示例:

df.withColumn("$NewIdentifers", when($"Identifers.idType" === "SELECTED", "DONE"))
//sample data
df.show(false)
//+---+-------------------------------------------------------------+
//|Id |Identifiers                                                  |
//+---+-------------------------------------------------------------+
//|123|{"country":"PR", "idType":"SELECTED","status":"Not Done"}    |
//|234|{"country":"PR", "idType":"NOT SELECTED","status":"Not Done"}|
//+---+-------------------------------------------------------------+

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._

//defining the schema
val sch=new StructType().add("country",StringType).add("idType",StringType).add("status",StringType)

//read the identifiers using from_json and pass the schema
val df1=df.withColumn("jsn",from_json(col("Identifiers"),sch)).select("Id","jsn.*")

//required json cols
val jsn_cols=df1.columns.filter(_.toLowerCase != "id")

//here we are using when otherwise and updating status column then recreating json object using to_json function

df1.withColumn("status",when(col("idType") === "SELECTED",lit("Done")).otherwise(col("status"))).
withColumn("identifiers",to_json(struct(jsn_cols.head,jsn_cols.tail:_*))).
drop(jsn_cols:_*).
show(false)

//+---+------------------------------------------------------------+
//|Id |identifiers                                                 |
//+---+------------------------------------------------------------+
//|123|{"country":"PR","idType":"SELECTED","status":"Done"}        |
//|234|{"country":"PR","idType":"NOT SELECTED","status":"Not Done"}|
//+---+------------------------------------------------------------+