如何在Spark 2 Scala中将行转换为json_Json_Scala_Apache Spark_Json4s

如何在Spark 2 Scala中将行转换为json

json scala apache-spark

如何在Spark 2 Scala中将行转换为json,json,scala,apache-spark,json4s,Json,Scala,Apache Spark,Json4s,有没有简单的方法将给定的行对象转换为json 找到了将整个数据帧转换为json输出的方法：但我只想将一行转换为json。下面是我要做的工作的伪代码更准确地说，我将json作为数据帧中的输入读取。我正在生成一个新的输出，它主要基于列，但是有一个json字段用于所有不适合列的信息我的问题是编写这个函数的最简单方法是什么：convertRowToJson（） Psidom的解决方案： def convertRowToJSON(row: Row): String = { val m

有没有简单的方法将给定的行对象转换为json

找到了将整个数据帧转换为json输出的方法：

但我只想将一行转换为json。下面是我要做的工作的伪代码

更准确地说，我将json作为数据帧中的输入读取。我正在生成一个新的输出，它主要基于列，但是有一个json字段用于所有不适合列的信息

我的问题是编写这个函数的最简单方法是什么：convertRowToJson（）

Psidom的解决方案：

def convertRowToJSON(row: Row): String = {
    val m = row.getValuesMap(row.schema.fieldNames)
    JSONObject(m).toString()
}

仅当行只有一个级别而没有嵌套行时才有效。这是模式：

StructType(
    StructField(indicator,StringType,true),   
    StructField(range,
    StructType(
        StructField(currency_code,StringType,true),
        StructField(maxrate,LongType,true), 
        StructField(minrate,LongType,true)),true))

还尝试了Artem建议，但未编译：

def row2DataFrame(row: Row, sqlContext: SQLContext): DataFrame = {
  val sparkContext = sqlContext.sparkContext
  import sparkContext._
  import sqlContext.implicits._
  import sqlContext._
  val rowRDD: RDD[Row] = sqlContext.sparkContext.makeRDD(row :: Nil)
  val dataFrame = rowRDD.toDF() //XXX does not compile
  dataFrame
}

本质上，您可以有一个只包含一行的数据帧。因此，您可以尝试过滤初始数据帧，然后将其解析为json。

您可以使用

getValuesMap

将行对象转换为映射，然后将其转换为json：

import scala.util.parsing.json.JSONObject
import org.apache.spark.sql._

val df = Seq((1,2,3),(2,3,4)).toDF("A", "B", "C")    
val row = df.first()          // this is an example row object

def convertRowToJSON(row: Row): String = {
    val m = row.getValuesMap(row.schema.fieldNames)
    JSONObject(m).toString()
}

convertRowToJSON(row)
// res46: String = {"A" : 1, "B" : 2, "C" : 3}

JSon有模式，但行没有模式，所以您需要在行上应用模式并转换为JSon。这是你可以做到的

import org.apache.spark.sql.Row
import org.apache.spark.sql.types._

def convertRowToJson(row: Row): String = {

  val schema = StructType(
      StructField("name", StringType, true) ::
      StructField("meta", StringType, false) ::  Nil)

      return sqlContext.applySchema(row, schema).toJSON
}

我结合了来自Artem、KiranM和Psidom的建议。做了很多尝试和错误，并提出了我针对嵌套结构测试的解决方案：

def row2Json(row: Row, sqlContext: SQLContext): String = {
  import sqlContext.implicits
  val rowRDD: RDD[Row] = sqlContext.sparkContext.makeRDD(row :: Nil)
  val dataframe = sqlContext.createDataFrame(rowRDD, row.schema)
  dataframe.toJSON.first
}

此解决方案有效，但仅在驱动程序模式下运行。

我需要读取json输入并生成json输出。大多数字段都是单独处理的，但只需要保留一些json子对象

当Spark读取数据帧时，它会将一条记录转换为一行。该行是一个类似json的结构。可以转换并写入json

但我需要将一些子json结构转换成字符串，用作新字段

可以这样做：

dataFrameWithJsonField = dataFrame.withColumn("address_json", to_json($"location.address"))

location.address

是传入的基于json的数据帧的子json对象的路径

address_json

是转换为json字符串版本的对象的列名

to_json

在Spark 2.1中实现

如果使用json4s address生成输出json，则应将json解析为AST表示形式，否则输出json将转义address json部分。

请注意，scala类scala.util.parsing.json.JSONObject已弃用，不支持空值

@不推荐（“此类将被删除。”，“2.11.0”）

“JSONFormat.defaultFormat不处理空值”

我也有同样的问题，我有带有规范模式（没有数组）的拼花文件，我只想获得json事件。我做了如下操作，它似乎工作得很好（Spark 2.1）：

如果要迭代数据帧，可以直接将数据帧转换为包含json对象的新数据帧，并对其进行迭代

val df_json=df.toJSON

更正：它实际上只适用于映射/结构的第一级，而不适用于嵌套映射。您只会看到值，而不会看到键。@SamiBadawi在哪里可以找到嵌套映射的解决方案？我在嵌套方面也有问题谢谢您的建议。我试过你的方法：def row2DataFrame（row:row，sqlContext:sqlContext）：DataFrame={val sparkContext=sqlContext.sparkContext导入sparkContext.\uimport sqlContext.implicits.\uimport sqlContext.\uval rowRDD:RDD[row]=sqlContext.sparkContext.makeRDD（row:：Nil）val DataFrame=rowRDD.toDF（）//XXX没有编译数据帧}它没有编译。谢谢Arnon。关于在Scala中实现json支持的现代化，已经有一些讨论。请编辑您的问题，否则更喜欢评论。在任何情况下，请阅读社区规则。

dataFrameWithJsonField = dataFrame.withColumn("address_json", to_json($"location.address"))

import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.{DataFrame, Dataset, Row}
import scala.util.parsing.json.JSONFormat.ValueFormatter
import scala.util.parsing.json.{JSONArray, JSONFormat, JSONObject}

def getValuesMap[T](row: Row, schema: StructType): Map[String,Any] = {
  schema.fields.map {
    field =>
      try{
        if (field.dataType.typeName.equals("struct")){
          field.name -> getValuesMap(row.getAs[Row](field.name),   field.dataType.asInstanceOf[StructType]) 
        }else{
          field.name -> row.getAs[T](field.name)
        }
      }catch {case e : Exception =>{field.name -> null.asInstanceOf[T]}}
  }.filter(xy => xy._2 != null).toMap
}

def convertRowToJSON(row: Row, schema: StructType): JSONObject = {
  val m: Map[String, Any] = getValuesMap(row, schema)
  JSONObject(m)
}
//I guess since I am using Any and not nothing the regular ValueFormatter is not working, and I had to add case jmap : Map[String,Any] => JSONObject(jmap).toString(defaultFormatter)
val defaultFormatter : ValueFormatter = (x : Any) => x match {
  case s : String => "\"" + JSONFormat.quoteString(s) + "\""
  case jo : JSONObject => jo.toString(defaultFormatter)
  case jmap : Map[String,Any] => JSONObject(jmap).toString(defaultFormatter)
  case ja : JSONArray => ja.toString(defaultFormatter)
  case other => other.toString
}

val someFile = "s3a://bucket/file"
val df: DataFrame = sqlContext.read.load(someFile)
val schema: StructType = df.schema
val jsons: Dataset[JSONObject] = df.map(row => convertRowToJSON(row, schema))