将嵌套数据帧写入JSON文件将删除属性的大小写_Json_Scala_Dataframe_Apache Spark_Apache Spark Sql

将嵌套数据帧写入JSON文件将删除属性的大小写

json scala dataframe apache-spark

将嵌套数据帧写入JSON文件将删除属性的大小写,json,scala,dataframe,apache-spark,apache-spark-sql,Json,Scala,Dataframe,Apache Spark,Apache Spark Sql,我有一个很大的嵌套数据框架，里面有很多列。以下是匿名模式的摘录： df.printSchema() root |-- column1: null (nullable = true) |-- camelCaseColumn1: string (nullable = false) |-- column2: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- column3:

我有一个很大的嵌套数据框架，里面有很多列。以下是匿名模式的摘录：

df.printSchema()
root
 |-- column1: null (nullable = true)
 |-- camelCaseColumn1: string (nullable = false)
 |-- column2: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- column3: string (nullable = true)
 |    |    |-- camelCaseColumn2: string (nullable = true)
 |    |    |-- column4: string (nullable = true)
 |    |    |-- camelCaseColumn3: struct (nullable = true)
 |    |    |    |-- column5: null (nullable = true)
 |    |    |    |-- column6: null (nullable = true)
 |    |    |-- camelCaseColumn4: string (nullable = true)

我将数据帧写入JSON格式：

df.write.mode("overwrite").json(targetPath)

我使用copyMerge（）函数合并生成的所有零件文件：

FileUtil.copyMerge(fs, srcPath, fs, dstFile, deleteSource, configuration, null)

然后，当我使用hdfs dfs-cat或-get获取结果JSON文件时：

 {
   "column1":"value",
   "camelCaseColumn1":"value",
   "column2":[
      {
         "column3":"value",
         "camelcasecolumn2":"value",
         "column4":"value",
         "camelcasecolumn3":{
            "column5":"value",
            "column6":"value"
         },
         "camelcasecolumn4":"value",

我们看到camelCase在JSON的第一个级别上被保留，但在更深的级别上它被小写

您是否有任何解释，并且可能是一种在JSON属性上显示camelcase的方法，无论这些属性在文件中的级别如何？我们正在我们的环境中使用Spark 1.6.3

编辑：找到解决方案，请参阅下面的评论。

好的，我找到了一些。似乎这样编写df:

df.toJSON.saveAsTextFile（targetPath）

解决了这个问题。我认为最初的问题是关于DataFrame如何处理嵌套StructType列的大小写敏感度。结果是一样的，部分文件在一个目录中，因此copyMerge（）仍然有效。