Java 如何将Spark数据帧输出转换为json?

Java 如何将Spark数据帧输出转换为json?,java,json,scala,apache-spark,apache-spark-sql,Java,Json,Scala,Apache Spark,Apache Spark Sql,我正在读取带有sparksql上下文的CSV文件 代码: m.put("path", CSV_DIRECTORY+file.getOriginalFilename()); m.put("inferSchema", "true"); // Automatically infer data types else string by default m.put("header", "true"); // Use first line of all files as header

我正在读取带有sparksql上下文的CSV文件

代码:

m.put("path", CSV_DIRECTORY+file.getOriginalFilename());
m.put("inferSchema", "true"); // Automatically infer data types else string by default
m.put("header", "true");      // Use first line of all files as header         
m.put("delimiter", ";");

DataFrame df = sqlContext.load("com.databricks.spark.csv",m);              
df.printSchema();
|--id : integer (nullable = true)
|-- ApplicationNo: string (nullable = true)
|-- Applidate: timestamp(nullable = true)
{"column":"id","datatype":"integer"}
使用
df.printSchema()获取列名和数据类型

O/p:

m.put("path", CSV_DIRECTORY+file.getOriginalFilename());
m.put("inferSchema", "true"); // Automatically infer data types else string by default
m.put("header", "true");      // Use first line of all files as header         
m.put("delimiter", ";");

DataFrame df = sqlContext.load("com.databricks.spark.csv",m);              
df.printSchema();
|--id : integer (nullable = true)
|-- ApplicationNo: string (nullable = true)
|-- Applidate: timestamp(nullable = true)
{"column":"id","datatype":"integer"}
语句printSchema的返回类型是什么。如何将输出转换为JSON格式,如何将数据帧转换为JSON

所需的O/p:

m.put("path", CSV_DIRECTORY+file.getOriginalFilename());
m.put("inferSchema", "true"); // Automatically infer data types else string by default
m.put("header", "true");      // Use first line of all files as header         
m.put("delimiter", ";");

DataFrame df = sqlContext.load("com.databricks.spark.csv",m);              
df.printSchema();
|--id : integer (nullable = true)
|-- ApplicationNo: string (nullable = true)
|-- Applidate: timestamp(nullable = true)
{"column":"id","datatype":"integer"}
DataType有一个json()方法和一个fromJson()方法,可用于序列化/反序列化模式

val df = sqlContext.read().....load()
val jsonString:String = df.schema.json()
val schema:StructType = DataType.fromJson(jsonString).asInstanceOf[StructType]
DataType有一个json()方法和一个fromJson()方法,可用于序列化/反序列化模式

val df = sqlContext.read().....load()
val jsonString:String = df.schema.json()
val schema:StructType = DataType.fromJson(jsonString).asInstanceOf[StructType]
以这种方式

df.createOrReplaceTempView("<table_name>")
spark.sql("SELECT COLLECT_SET(STRUCT(<field_name>)) AS `` FROM <table_name> LIMIT 1").coalesce(1).write.format("org.apache.spark.sql.json").mode("overwrite").save(<Blob Path1/ ADLS Path1>)
df.createOrReplaceTempView(“”)
spark.sql(“选择COLLECT_SET(STRUCT())作为“`FROM LIMIT 1”).coalesce(1).write.format(“org.apache.spark.sql.json”).mode(“overwrite”).save()
输出将是这样的

{"":[{<field_name>:<field_value1>},{<field_name>:<field_value2>}]}
{”“:[{:},{:}]}
这里可以通过以下3行避免标题(假设数据中没有Tilda)

val jsonToCsvDF=spark.read.format(“com.databricks.spark.csv”).option(“delimiter”、“~”).load()
jsonToCsvDF.createOrReplaceTempView(“json到csv”)
spark.sql(“从json_到_csv选择SUBSTR(`u c0`,5,长度(`u c0`)-5”)。coalesce(1)。write.option(“header”,false)。mode(“overwrite”).text()
希望对你有所帮助。

Spark SQL way

df.createOrReplaceTempView("<table_name>")
spark.sql("SELECT COLLECT_SET(STRUCT(<field_name>)) AS `` FROM <table_name> LIMIT 1").coalesce(1).write.format("org.apache.spark.sql.json").mode("overwrite").save(<Blob Path1/ ADLS Path1>)
df.createOrReplaceTempView(“”)
spark.sql(“选择COLLECT_SET(STRUCT())作为“`FROM LIMIT 1”).coalesce(1).write.format(“org.apache.spark.sql.json”).mode(“overwrite”).save()
输出将是这样的

{"":[{<field_name>:<field_value1>},{<field_name>:<field_value2>}]}
{”“:[{:},{:}]}
这里可以通过以下3行避免标题(假设数据中没有Tilda)

val jsonToCsvDF=spark.read.format(“com.databricks.spark.csv”).option(“delimiter”、“~”).load()
jsonToCsvDF.createOrReplaceTempView(“json到csv”)
spark.sql(“从json_到_csv选择SUBSTR(`u c0`,5,长度(`u c0`)-5”)。coalesce(1)。write.option(“header”,false)。mode(“overwrite”).text()
希望有帮助。

谢谢@Hamel df.schema().json();完成了任务。谢谢@Hamel df.schema().json();完成了任务。