在数据帧中更新JSON文件时出现序列化问题_Json_Scala_Insert Update_Databricks

在数据帧中更新JSON文件时出现序列化问题

json scala

在数据帧中更新JSON文件时出现序列化问题,json,scala,insert-update,databricks,Json,Scala,Insert Update,Databricks,我读入了一个JSON文件，并将其存储在数据帧中 val df1 = spark.read.option("multiline", "true") .json("dbfs:/something.json") 此文件的架构如下所示： Connections:array element:struct Name:string Properties:struct database:strin

我读入了一个JSON文件，并将其存储在数据帧中

val df1 = spark.read.option("multiline", "true") .json("dbfs:/something.json")
此文件的架构如下所示：

Connections:array element:struct Name:string Properties:struct database:string driver:string hostname:string password.encrypted:string password.encrypted.keyARN:string port:string username:string Type:string
我想建立一个功能，可以重用时，我想添加一个新的连接
我不确定这样做的最佳方式是什么，我应该构建一个新的模式，用数据填充它并将其附加到原始连接数组，然后简单地写回文件吗
这就是我试图让它工作的方式，但是序列化有一个错误

import org.apache.spark.sql.types.{StructType, StructField, IntegerType, StringType, ArrayType, FloatType} val zipsSchema3 = StructType(List( StructField("Name", StringType, true), StructField("Properties", StructType(List( StructField("driver", StringType, true), StructField("hostname", StringType, true), StructField("password.encrypted", StringType, true), StructField("password.encrypted.keyARN", StringType, true), StructField("port", StringType, true), StructField("username", StringType, true) ))), StructField("Type", StringType, true) )) val data2 = Seq( Row("db2", struct("test","testHost","encpwd","keyTest","testPort","testUser"), "typeTest")) val df = spark.createDataFrame( spark.sparkContext.parallelize(data2), zipsSchema3 )
或者，在这种情况下是否可以使用一些内置函数

提前感谢您的所有建议！：）
我不太清楚为什么，但当我这样运行时，序列化错误消失了

val zipsSchema3 = StructType(List( StructField("Name", StringType, true), StructField("Properties", StructType(List( StructField("driver", StringType, true), StructField("hostname", StringType, true), StructField("password.encrypted", StringType, true), StructField("password.encrypted.keyARN", StringType, true), StructField("port", StringType, true), StructField("username", StringType, true) ))), StructField("Type", StringType, true) )) val data2 = Seq(("db2", Seq("test","testHost","encpwd","keyTest","testPort","testUser"), "typeTest")) val rdd = spark.sparkContext.parallelize(data2) .map{ case (name, props, sType) => Row(name, props, sType ) } val df = spark.createDataFrame( rdd, zipsSchema3 )

运行这段代码时，我没有遇到错误，但我无法将数据框中的数据写出来。我收到以下错误：编码时出错：java.lang.RuntimeException:scala.collection.immutable。$colon$colon不是结构模式的有效外部类型。我的意思是，例如df.show（）、display（df），所有这些都不想向我显示结果，否则它看起来不错。我的代码也有同样的问题，只是错误消息不同：）