如何在spark中编写有效的json

如何在spark中编写有效的json,json,apache-spark,apache-spark-sql,apache-spark-dataset,apache-spark-2.0,Json,Apache Spark,Apache Spark Sql,Apache Spark Dataset,Apache Spark 2.0,我需要编写有效的json,但spark允许一次编写一行,如: {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}} {"name":"Michael", "address":{"city":null, "state":"California"}} 上面的Json无效。相反,我需要这个: { {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}}, {"name"

我需要编写有效的json,但spark允许一次编写一行,如:

{"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}}
{"name":"Michael", "address":{"city":null, "state":"California"}}
上面的Json无效。相反,我需要这个:

{
{"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}},
{"name":"Michael", "address":{"city":null, "state":"California"}}
}

如何在java中实现它?

首先从将
DataFrame
行转换为json开始:

Scala

val jsonDs = df.toJSON
Dataset<String> jsonDs = simpleProf.toJSON();
val count = jsonDs.count()
jsonDs
  .repartition(1) // make sure it is only one partition and in consequence one output file
  .rdd
  .zipWithIndex()
  .map { case(json, idx) =>
      if(idx == 0) "[\n" + json + "," // first row
      else if(idx == count-1) json + "\n]" // last row
      else json + ","
  }
  .saveAsTextFile("path")
jsonDs
  .repartition(1) // make sure it is only one partition and in consequence one output file
  .javaRDD()
  .zipWithIndex()
  .map(t -> t._2 == 0 ? "[\n" + t._1 + "," : t._2 == count-1 ? t._1 + "\n]" : t._1 + ",")
  .saveAsTextFile("path");
jsonDs
  .mapPartitions(vals => Iterator("[" + vals.mkString(",") + "]"))
  .write
  .text("path")
import org.apache.commons.lang3.StringUtils;

jsonDs
  .mapPartitions(input -> Arrays.asList("[" + StringUtils.join(input, ",") + "]").iterator(), Encoders.STRING())
  .write()
  .text("path");
Java

val jsonDs = df.toJSON
Dataset<String> jsonDs = simpleProf.toJSON();
val count = jsonDs.count()
jsonDs
  .repartition(1) // make sure it is only one partition and in consequence one output file
  .rdd
  .zipWithIndex()
  .map { case(json, idx) =>
      if(idx == 0) "[\n" + json + "," // first row
      else if(idx == count-1) json + "\n]" // last row
      else json + ","
  }
  .saveAsTextFile("path")
jsonDs
  .repartition(1) // make sure it is only one partition and in consequence one output file
  .javaRDD()
  .zipWithIndex()
  .map(t -> t._2 == 0 ? "[\n" + t._1 + "," : t._2 == count-1 ? t._1 + "\n]" : t._1 + ",")
  .saveAsTextFile("path");
jsonDs
  .mapPartitions(vals => Iterator("[" + vals.mkString(",") + "]"))
  .write
  .text("path")
import org.apache.commons.lang3.StringUtils;

jsonDs
  .mapPartitions(input -> Arrays.asList("[" + StringUtils.join(input, ",") + "]").iterator(), Encoders.STRING())
  .write()
  .text("path");
接下来的步骤取决于每个分区是保存到一个文件还是多个文件

保存到一个JSON文件 Scala

val jsonDs = df.toJSON
Dataset<String> jsonDs = simpleProf.toJSON();
val count = jsonDs.count()
jsonDs
  .repartition(1) // make sure it is only one partition and in consequence one output file
  .rdd
  .zipWithIndex()
  .map { case(json, idx) =>
      if(idx == 0) "[\n" + json + "," // first row
      else if(idx == count-1) json + "\n]" // last row
      else json + ","
  }
  .saveAsTextFile("path")
jsonDs
  .repartition(1) // make sure it is only one partition and in consequence one output file
  .javaRDD()
  .zipWithIndex()
  .map(t -> t._2 == 0 ? "[\n" + t._1 + "," : t._2 == count-1 ? t._1 + "\n]" : t._1 + ",")
  .saveAsTextFile("path");
jsonDs
  .mapPartitions(vals => Iterator("[" + vals.mkString(",") + "]"))
  .write
  .text("path")
import org.apache.commons.lang3.StringUtils;

jsonDs
  .mapPartitions(input -> Arrays.asList("[" + StringUtils.join(input, ",") + "]").iterator(), Encoders.STRING())
  .write()
  .text("path");
Java

val jsonDs = df.toJSON
Dataset<String> jsonDs = simpleProf.toJSON();
val count = jsonDs.count()
jsonDs
  .repartition(1) // make sure it is only one partition and in consequence one output file
  .rdd
  .zipWithIndex()
  .map { case(json, idx) =>
      if(idx == 0) "[\n" + json + "," // first row
      else if(idx == count-1) json + "\n]" // last row
      else json + ","
  }
  .saveAsTextFile("path")
jsonDs
  .repartition(1) // make sure it is only one partition and in consequence one output file
  .javaRDD()
  .zipWithIndex()
  .map(t -> t._2 == 0 ? "[\n" + t._1 + "," : t._2 == count-1 ? t._1 + "\n]" : t._1 + ",")
  .saveAsTextFile("path");
jsonDs
  .mapPartitions(vals => Iterator("[" + vals.mkString(",") + "]"))
  .write
  .text("path")
import org.apache.commons.lang3.StringUtils;

jsonDs
  .mapPartitions(input -> Arrays.asList("[" + StringUtils.join(input, ",") + "]").iterator(), Encoders.STRING())
  .write()
  .text("path");
保存到每个分区的多个JSON文件 Scala

val jsonDs = df.toJSON
Dataset<String> jsonDs = simpleProf.toJSON();
val count = jsonDs.count()
jsonDs
  .repartition(1) // make sure it is only one partition and in consequence one output file
  .rdd
  .zipWithIndex()
  .map { case(json, idx) =>
      if(idx == 0) "[\n" + json + "," // first row
      else if(idx == count-1) json + "\n]" // last row
      else json + ","
  }
  .saveAsTextFile("path")
jsonDs
  .repartition(1) // make sure it is only one partition and in consequence one output file
  .javaRDD()
  .zipWithIndex()
  .map(t -> t._2 == 0 ? "[\n" + t._1 + "," : t._2 == count-1 ? t._1 + "\n]" : t._1 + ",")
  .saveAsTextFile("path");
jsonDs
  .mapPartitions(vals => Iterator("[" + vals.mkString(",") + "]"))
  .write
  .text("path")
import org.apache.commons.lang3.StringUtils;

jsonDs
  .mapPartitions(input -> Arrays.asList("[" + StringUtils.join(input, ",") + "]").iterator(), Encoders.STRING())
  .write()
  .text("path");
Java

val jsonDs = df.toJSON
Dataset<String> jsonDs = simpleProf.toJSON();
val count = jsonDs.count()
jsonDs
  .repartition(1) // make sure it is only one partition and in consequence one output file
  .rdd
  .zipWithIndex()
  .map { case(json, idx) =>
      if(idx == 0) "[\n" + json + "," // first row
      else if(idx == count-1) json + "\n]" // last row
      else json + ","
  }
  .saveAsTextFile("path")
jsonDs
  .repartition(1) // make sure it is only one partition and in consequence one output file
  .javaRDD()
  .zipWithIndex()
  .map(t -> t._2 == 0 ? "[\n" + t._1 + "," : t._2 == count-1 ? t._1 + "\n]" : t._1 + ",")
  .saveAsTextFile("path");
jsonDs
  .mapPartitions(vals => Iterator("[" + vals.mkString(",") + "]"))
  .write
  .text("path")
import org.apache.commons.lang3.StringUtils;

jsonDs
  .mapPartitions(input -> Arrays.asList("[" + StringUtils.join(input, ",") + "]").iterator(), Encoders.STRING())
  .write()
  .text("path");

在不知道您已经拥有了什么的情况下很难说出您在问什么,但这是否有帮助:?我需要将配置单元表转换为XML,但我在这方面遇到了各种问题。所以首先我将配置单元表转换为json,然后我将直接将json转换为XML。但当我将配置单元转换为Json时,我发现Json无效。因此,只需将其转换为有效的文件即可。如果将其重新分区为1,则多个json文件的解决方案应适用于一个文件(最好使用coalesce(1)而不是重新分区(1))。请注意,当更改为一个分区时,所有内容都必须适合该执行器。此外,您还缺少对象周围的{和}。请提供“保存到一个JSON文件”的java代码,因为我不熟悉java中的lambda表达式。您好,我添加了java实现。