Arrays Scala-如何将字符串列转换为Json数组
使用下面的DataFrame,我得到了一个Json数组,但数据类型是String,我正在寻找帮助,将这个字符串转换为Json数组Arrays Scala-如何将字符串列转换为Json数组,arrays,json,scala,apache-spark,explode,Arrays,Json,Scala,Apache Spark,Explode,使用下面的DataFrame,我得到了一个Json数组,但数据类型是String,我正在寻找帮助,将这个字符串转换为Json数组 val rawDF = spark.sql("select 1").withColumn("parent_id", lit("Parent_12345")).withColumn("jsonString", lit("""[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res
val rawDF = spark.sql("select 1").withColumn("parent_id", lit("Parent_12345")).withColumn("jsonString", lit("""[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res":"5.2"}},{"Third":{"Info":"ABCD123","Res":"5.2"}}]"""))
rawDF.show(false)
输入和输出数据帧:
Input DataFrame :
+----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+
|item_id |s_tag |jsonString |
+----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+
|Item_12345|S_12345|[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res":"5.2"}},{"Third":{"Info":"ABCD123","Res":"5.2"}}] |
+----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+
Output DataFrame :
+----------+-------+-----------------------------------------+
|item_id |s_tag |jsonString |
+----------+-------+-----------------------------------------+
|Item_12345|S_12345|{"First":{"Info":"ABCD123","Res":"5.2"}} |
+----------+-------+-----------------------------------------+
|Item_12345|S_12345|{"Second":{"Info":"ABCD123","Res":"5.2"}}|
+----------+-------+-----------------------------------------+
|Item_12345|S_12345|{"Third":{"Info":"ABCD123","Res":"5.2"}} |
+----------+-------+-----------------------------------------+
问题陈述:
jsonString
是字符串数据,但看起来像是json数组,我想将此列转换为json数组,以拆分为可能的行数
作为输出数据帧
到目前为止,我所尝试的:
val jsonArray = udf((value: String) => new JSONArray(value)) // Or how to convert as Array of json.
val strToJsonArray = rawDF.withColumn("arrJson", jsonArray(rawDF("jsonString"))).drop("jsonString") //This is not working.
//If We can convert To Array then using below code I can Split the Json Column in expected Output.
val splittedDF = strToJsonArray.withColumn("splittedJson", explode(strToJsonArray.col("arrJson"))).drop("arrJson")
如何将字符串转换为JSON值数组?无需使用
UDF
在这种情况下,我们可以使用spark内置函数拆分、regexp\u替换、分解
示例:
//sample data
val rawDF = spark.sql("""select string("Item_12345") as item_id""").withColumn("s_tag", lit("S_12345")).withColumn("jsonString", lit("""[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res":"5.2"}},{"Third":{"Info":"ABCD123","Res":"5.2"}}]"""))
//to make valid array we first replace (},) with (}},) then remove ("[|]") and split on (},) it results array finally we explode on the array.
rawDF.
selectExpr("item_id","s_tag","""explode(split(regexp_replace(regexp_replace(jsonString,'(\\\},)','}},'),'(\\\[|\\\])',''),"},")) as jsonString""").
show(false)
//+----------+-------+-----------------------------------------+
//|item_id |s_tag |jsonString |
//+----------+-------+-----------------------------------------+
//|Item_12345|S_12345|{"First":{"Info":"ABCD123","Res":"5.2"}} |
//|Item_12345|S_12345|{"Second":{"Info":"ABCD123","Res":"5.2"}}|
//|Item_12345|S_12345|{"Third":{"Info":"ABCD123","Res":"5.2"}} |
//+----------+-------+-----------------------------------------+
嘿@Shu,太好了!!你的代码工作起来很神奇。。。。!非常感谢。另外,你能在这个问题上帮助我吗?