Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Spark将Json数组转换为结构数组_Json_Scala_Dataframe_Apache Spark_Apache Spark Sql - Fatal编程技术网

Spark将Json数组转换为结构数组

Spark将Json数组转换为结构数组,json,scala,dataframe,apache-spark,apache-spark-sql,Json,Scala,Dataframe,Apache Spark,Apache Spark Sql,我正在寻找将JSON字符串数组转换为struct数组的方法 样本数据: { “col1”:“col1Value”, “col2”:[ “{\'SubCol1\':\'ABCD\',\'SubCol2\':\'EFGH\'”, “{\'SubCol1\':\'IJKL\',\'SubCol2\':\'MNOP\'” ] } 数据集架构: StructType(StructField(col1,StringType,true),StructField(col2,ArrayType(StringTy

我正在寻找将JSON字符串数组转换为struct数组的方法

样本数据:

{
“col1”:“col1Value”,
“col2”:[
“{\'SubCol1\':\'ABCD\',\'SubCol2\':\'EFGH\'”,
“{\'SubCol1\':\'IJKL\',\'SubCol2\':\'MNOP\'”
]
}
数据集架构:

StructType(StructField(col1,StringType,true),StructField(col2,ArrayType(StringType,true),true))
预期产出:

{
“col1”:“col1Value”,
“col2”:[
{“SubCol1”:“ABCD”,“SubCol2”:“EFGH”},
{“SubCol1”:“IJKL”,“SubCol2”:“MNOP”}
]
}
预期架构:

StructType(StructField(col1,StringType,true),StructField(col2,ArrayType)(StructType(StructField(SubCol1,StringType,true),StructField(SubCol2,StringType,true)),true))
我尝试了
df.withColumn(“col2”,来自于_json($“col2”,新的_模式))
,但这给了我错误:

org.apache.spark.sql.AnalysisException:由于数据类型不匹配,无法解析“jsontostructs(`col2`)”:参数1需要字符串类型,但“col2`”是数组类型


您可以先将
col2
强制转换为字符串类型:

val df2 = df.withColumn("col2", 
    from_json(
        $"col2".cast("string"), 
        lit("array<struct<SubCol1:string, SubCol2:string>>")
        // or use new_schema as in your code
    )
)
val df2=df.withColumn(“col2”,
来自_json(
$“col2”.cast(“字符串”),
照明(“阵列”)
//或者在代码中使用新的_模式
)
)

df.withColumn(“col2”,来自_json($“col2.cast”(“string”)、ArrayType(new StructType().add(“SubCol1”,StringType).add(“SubCol2”,StringType)))
解决了这个问题。谢谢。在同一行中,我们如何将MapType(StringType,StringType)转换为MapType(StringType,StructType)。这里,我想将JSON字符串映射的值转换为struct。将列数据强制转换为字符串在这里不起作用。