Scala 我的任务是更新spark dataframe,它的列类型为string和struct。我想在斯卡拉做些什么
当Scala 我的任务是更新spark dataframe,它的列类型为string和struct。我想在斯卡拉做些什么,scala,apache-spark,Scala,Apache Spark,当idType等于“SELECTED”时,需要更改$identivers.status=“Done” 因此,预期的输出将是 Id Identifers '123' {"country":"PR", "idType":"SELECTED","status":"Not Done"} '234' {"country":"PR", "idType":"NOT SELECTED","status":"Not Done"} 我试着用 Id Identif
idType
等于“SELECTED”时,需要更改$identivers.status=“Done”
因此,预期的输出将是
Id Identifers
'123' {"country":"PR", "idType":"SELECTED","status":"Not Done"}
'234' {"country":"PR", "idType":"NOT SELECTED","status":"Not Done"}
我试着用
Id Identifers
'123' {"country":"PR", "idType":"SELECTED","status":"Not Done"}
'234' {"country":"PR", "idType":"NOT SELECTED","status":"Done"}
但是,从Spark-2.2开始,这个失败给出了空值:
对于这种情况,您可以使用内置函数中的from_json(创建单个COL)和(重新创建json对象)
示例:
df.withColumn("$NewIdentifers", when($"Identifers.idType" === "SELECTED", "DONE"))
//sample data
df.show(false)
//+---+-------------------------------------------------------------+
//|Id |Identifiers |
//+---+-------------------------------------------------------------+
//|123|{"country":"PR", "idType":"SELECTED","status":"Not Done"} |
//|234|{"country":"PR", "idType":"NOT SELECTED","status":"Not Done"}|
//+---+-------------------------------------------------------------+
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
//defining the schema
val sch=new StructType().add("country",StringType).add("idType",StringType).add("status",StringType)
//read the identifiers using from_json and pass the schema
val df1=df.withColumn("jsn",from_json(col("Identifiers"),sch)).select("Id","jsn.*")
//required json cols
val jsn_cols=df1.columns.filter(_.toLowerCase != "id")
//here we are using when otherwise and updating status column then recreating json object using to_json function
df1.withColumn("status",when(col("idType") === "SELECTED",lit("Done")).otherwise(col("status"))).
withColumn("identifiers",to_json(struct(jsn_cols.head,jsn_cols.tail:_*))).
drop(jsn_cols:_*).
show(false)
//+---+------------------------------------------------------------+
//|Id |identifiers |
//+---+------------------------------------------------------------+
//|123|{"country":"PR","idType":"SELECTED","status":"Done"} |
//|234|{"country":"PR","idType":"NOT SELECTED","status":"Not Done"}|
//+---+------------------------------------------------------------+