Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Arrays Scala-如何将字符串列转换为Json数组_Arrays_Json_Scala_Apache Spark_Explode - Fatal编程技术网

Arrays Scala-如何将字符串列转换为Json数组

Arrays Scala-如何将字符串列转换为Json数组,arrays,json,scala,apache-spark,explode,Arrays,Json,Scala,Apache Spark,Explode,使用下面的DataFrame,我得到了一个Json数组,但数据类型是String,我正在寻找帮助,将这个字符串转换为Json数组 val rawDF = spark.sql("select 1").withColumn("parent_id", lit("Parent_12345")).withColumn("jsonString", lit("""[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res

使用下面的DataFrame,我得到了一个Json数组,但数据类型是String,我正在寻找帮助,将这个字符串转换为Json数组

val rawDF = spark.sql("select 1").withColumn("parent_id", lit("Parent_12345")).withColumn("jsonString", lit("""[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res":"5.2"}},{"Third":{"Info":"ABCD123","Res":"5.2"}}]"""))
        rawDF.show(false)
输入和输出数据帧:

Input DataFrame :

+----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+
|item_id   |s_tag  |jsonString                                                                                                                         |
+----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+
|Item_12345|S_12345|[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res":"5.2"}},{"Third":{"Info":"ABCD123","Res":"5.2"}}]      |
+----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+


Output DataFrame :
+----------+-------+-----------------------------------------+
|item_id   |s_tag  |jsonString                               |
+----------+-------+-----------------------------------------+
|Item_12345|S_12345|{"First":{"Info":"ABCD123","Res":"5.2"}} |
+----------+-------+-----------------------------------------+
|Item_12345|S_12345|{"Second":{"Info":"ABCD123","Res":"5.2"}}|
+----------+-------+-----------------------------------------+
|Item_12345|S_12345|{"Third":{"Info":"ABCD123","Res":"5.2"}} |
+----------+-------+-----------------------------------------+
问题陈述:

jsonString
是字符串数据,但看起来像是json数组,我想将此列转换为json数组,以拆分为可能的行数 作为输出数据帧

到目前为止,我所尝试的:

val jsonArray = udf((value: String) => new JSONArray(value)) // Or how to convert as Array of json.

val strToJsonArray = rawDF.withColumn("arrJson", jsonArray(rawDF("jsonString"))).drop("jsonString") //This is not working.

//If We can convert To Array then using below code I can Split the Json Column in expected Output.
val splittedDF = strToJsonArray.withColumn("splittedJson", explode(strToJsonArray.col("arrJson"))).drop("arrJson")

如何将字符串转换为JSON值数组?

无需使用
UDF
在这种情况下,我们可以使用spark内置函数
拆分、regexp\u替换、分解

示例:

//sample data
val rawDF = spark.sql("""select string("Item_12345") as item_id""").withColumn("s_tag", lit("S_12345")).withColumn("jsonString", lit("""[{"First":{"Info":"ABCD123","Res":"5.2"}},{"Second":{"Info":"ABCD123","Res":"5.2"}},{"Third":{"Info":"ABCD123","Res":"5.2"}}]"""))

//to make valid array we first replace (},) with (}},) then remove ("[|]") and split on (},) it results array finally we explode on the array. 
rawDF.
selectExpr("item_id","s_tag","""explode(split(regexp_replace(regexp_replace(jsonString,'(\\\},)','}},'),'(\\\[|\\\])',''),"},")) as jsonString""").
show(false)

//+----------+-------+-----------------------------------------+
//|item_id   |s_tag  |jsonString                               |
//+----------+-------+-----------------------------------------+
//|Item_12345|S_12345|{"First":{"Info":"ABCD123","Res":"5.2"}} |
//|Item_12345|S_12345|{"Second":{"Info":"ABCD123","Res":"5.2"}}|
//|Item_12345|S_12345|{"Third":{"Info":"ABCD123","Res":"5.2"}} |
//+----------+-------+-----------------------------------------+

嘿@Shu,太好了!!你的代码工作起来很神奇。。。。!非常感谢。另外,你能在这个问题上帮助我吗?