Spark JSON数组

Spark JSON数组,json,apache-spark,Json,Apache Spark,我有一个Spark数据框,下面有列 uuid|some_data "A" |"ABC" "B" |"DEF" 我需要将其转换为以下格式的嵌套JSON {"data":[{"attributes":[{"uuid":"A","some_data":"ABC"}]}]} {"data":[{"attributes":[{"uuid":"B","some_data":"DEF"}]}]} {"data": {"attributes": {"uuid":"A","some_data":"ABC}}}

我有一个Spark数据框,下面有列

uuid|some_data
"A" |"ABC"
"B" |"DEF"
我需要将其转换为以下格式的嵌套JSON

{"data":[{"attributes":[{"uuid":"A","some_data":"ABC"}]}]}
{"data":[{"attributes":[{"uuid":"B","some_data":"DEF"}]}]}
{"data": {"attributes": {"uuid":"A","some_data":"ABC}}}
{"data": {"attributes": {"uuid":"B","some_data":"DEF}}}
我尝试了下面的代码来实现这一点

val jsonDF=dataFrame.select 要_jsonstructdataFrame.columns.mapcolumn:*.aliasattributes val jsonDF2=jsonDF.select 要_jsonstructjsonDFcolumn:*.aliasdata val jsonDF3=jsonDF2 到_jsonstructjsonDF2.columns.mapcolumn:*.aliasvalue 。选择ExprcastValue作为字符串 最终得到以下格式

{"data":[{"attributes":[{"uuid":"A","some_data":"ABC"}]}]}
{"data":[{"attributes":[{"uuid":"B","some_data":"DEF"}]}]}
{"data": {"attributes": {"uuid":"A","some_data":"ABC}}}
{"data": {"attributes": {"uuid":"B","some_data":"DEF}}}

请告诉我需要做哪些更改才能将其转换为所需格式。

每个JSON文档都需要自己的结构。此外,还需要一个数组来包装数据,另一个数组来包装属性:

合并:

Seq(("A", "ABC"))
  .toDF("uuid", "some_data")
  .select(to_json(jsonData) as "data")
  .show(false)
+-----------------------------+ |资料| +-----------------------------+ |{data:[{attributes:[{uuid:A,some_data:ABC}]}| +-----------------------------+