Spark JSON数组
我有一个Spark数据框,下面有列Spark JSON数组,json,apache-spark,Json,Apache Spark,我有一个Spark数据框,下面有列 uuid|some_data "A" |"ABC" "B" |"DEF" 我需要将其转换为以下格式的嵌套JSON {"data":[{"attributes":[{"uuid":"A","some_data":"ABC"}]}]} {"data":[{"attributes":[{"uuid":"B","some_data":"DEF"}]}]} {"data": {"attributes": {"uuid":"A","some_data":"ABC}}}
uuid|some_data
"A" |"ABC"
"B" |"DEF"
我需要将其转换为以下格式的嵌套JSON
{"data":[{"attributes":[{"uuid":"A","some_data":"ABC"}]}]}
{"data":[{"attributes":[{"uuid":"B","some_data":"DEF"}]}]}
{"data": {"attributes": {"uuid":"A","some_data":"ABC}}}
{"data": {"attributes": {"uuid":"B","some_data":"DEF}}}
我尝试了下面的代码来实现这一点
val jsonDF=dataFrame.select
要_jsonstructdataFrame.columns.mapcolumn:*.aliasattributes
val jsonDF2=jsonDF.select
要_jsonstructjsonDFcolumn:*.aliasdata
val jsonDF3=jsonDF2
到_jsonstructjsonDF2.columns.mapcolumn:*.aliasvalue
。选择ExprcastValue作为字符串
最终得到以下格式
{"data":[{"attributes":[{"uuid":"A","some_data":"ABC"}]}]}
{"data":[{"attributes":[{"uuid":"B","some_data":"DEF"}]}]}
{"data": {"attributes": {"uuid":"A","some_data":"ABC}}}
{"data": {"attributes": {"uuid":"B","some_data":"DEF}}}
请告诉我需要做哪些更改才能将其转换为所需格式。每个JSON文档都需要自己的结构。此外,还需要一个数组来包装数据,另一个数组来包装属性: 合并:
Seq(("A", "ABC"))
.toDF("uuid", "some_data")
.select(to_json(jsonData) as "data")
.show(false)
+-----------------------------+
|资料|
+-----------------------------+
|{data:[{attributes:[{uuid:A,some_data:ABC}]}|
+-----------------------------+