Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/email/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在数据帧中更新JSON文件时出现序列化问题_Json_Scala_Insert Update_Databricks - Fatal编程技术网

在数据帧中更新JSON文件时出现序列化问题

在数据帧中更新JSON文件时出现序列化问题,json,scala,insert-update,databricks,Json,Scala,Insert Update,Databricks,我读入了一个JSON文件,并将其存储在数据帧中 val df1 = spark.read.option("multiline", "true") .json("dbfs:/something.json") 此文件的架构如下所示: Connections:array element:struct Name:string Properties:struct database:strin

我读入了一个JSON文件,并将其存储在数据帧中

val df1 = spark.read.option("multiline", "true")
            .json("dbfs:/something.json")
此文件的架构如下所示:

Connections:array
    element:struct
           Name:string
           Properties:struct
                   database:string
                   driver:string
                   hostname:string
                   password.encrypted:string
                   password.encrypted.keyARN:string
                   port:string
                   username:string
           Type:string
我想建立一个功能,可以重用时,我想添加一个新的连接

我不确定这样做的最佳方式是什么,我应该构建一个新的模式,用数据填充它并将其附加到原始连接数组,然后简单地写回文件吗

这就是我试图让它工作的方式,但是序列化有一个错误

import org.apache.spark.sql.types.{StructType, StructField, IntegerType, StringType, ArrayType, FloatType}

val zipsSchema3 = StructType(List(
  StructField("Name", StringType, true), 
  StructField("Properties", StructType(List(
      StructField("driver", StringType, true), 
      StructField("hostname", StringType, true), 
      StructField("password.encrypted", StringType, true), 
      StructField("password.encrypted.keyARN", StringType, true), 
      StructField("port", StringType, true), 
      StructField("username", StringType, true)
 ))),
  StructField("Type", StringType, true)
))

val data2 = Seq(
  Row("db2", struct("test","testHost","encpwd","keyTest","testPort","testUser"), "typeTest"))

val df = spark.createDataFrame(
  spark.sparkContext.parallelize(data2),
  zipsSchema3
)
或者,在这种情况下是否可以使用一些内置函数


提前感谢您的所有建议!:)

我不太清楚为什么,但当我这样运行时,序列化错误消失了

 val zipsSchema3 = StructType(List(
      StructField("Name", StringType, true), 
      StructField("Properties", StructType(List(
          StructField("driver", StringType, true), 
          StructField("hostname", StringType, true), 
          StructField("password.encrypted", StringType, true), 
          StructField("password.encrypted.keyARN", StringType, true), 
          StructField("port", StringType, true), 
          StructField("username", StringType, true)
     ))),
      StructField("Type", StringType, true)
    ))

val data2 = Seq(("db2", Seq("test","testHost","encpwd","keyTest","testPort","testUser"), "typeTest"))

val rdd = spark.sparkContext.parallelize(data2)
  .map{ case (name, props, sType) => Row(name, props, sType ) }

val df = spark.createDataFrame(
  rdd,
  zipsSchema3  
)

运行这段代码时,我没有遇到错误,但我无法将数据框中的数据写出来。我收到以下错误:编码时出错:java.lang.RuntimeException:scala.collection.immutable。$colon$colon不是结构模式的有效外部类型。我的意思是,例如df.show()、display(df),所有这些都不想向我显示结果,否则它看起来不错。我的代码也有同样的问题,只是错误消息不同:)