Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Spark Scala解析JSON文件_Json_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

使用Spark Scala解析JSON文件

使用Spark Scala解析JSON文件,json,scala,apache-spark,apache-spark-sql,Json,Scala,Apache Spark,Apache Spark Sql,我有如下所示的JSON源数据文件,我需要预期结果格式完全不同,如下所示,我是否可以使用Spark Scala实现这一点。谢谢你在这方面的帮助 JSON源数据文件 { "APP": [ { "E": 1566799999225, "V": 44.0 }, { "E": 1566800002758, "V": 61.0 } ], "ASP": [ { "E": 1566800009446,

我有如下所示的JSON源数据文件,我需要预期结果格式完全不同,如下所示,我是否可以使用Spark Scala实现这一点。谢谢你在这方面的帮助

JSON源数据文件

{
  "APP": [
    {
      "E": 1566799999225,
      "V": 44.0
    },
    {
      "E": 1566800002758,
      "V": 61.0
    }
  ],
  "ASP": [
    {
      "E": 1566800009446,
      "V": 23.399999618530273
    }
  ],
  "TT": 0,
  "TVD": [
    {
      "E": 1566799964040,
      "V": 50876515
    }
  ],
  "VIN": "FU74HZ501740XXXXX"
}
预期成果:

JSON模式:

|-- APP: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: double (nullable = true)
|-- ASP: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: double (nullable = true)
|-- ATO: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: double (nullable = true)
|-- MSG_TYPE: string (nullable = true)
|-- RPM: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: double (nullable = true)
|-- TT: long (nullable = true)
|-- TVD: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: long (nullable = true)
|-- VIN: string (nullable = true)

以下是将json解析为适合您的数据的spark数据帧的解决方案:

    val input = "{\"APP\":[{\"E\":1566799999225,\"V\":44.0},{\"E\":1566800002758,\"V\":61.0}],\"ASP\":[{\"E\":1566800009446,\"V\":23.399999618530273}],\"TT\":0,\"TVD\":[{\"E\":1566799964040,\"V\":50876515}],\"VIN\":\"FU74HZ501740XXXXX\"}"

    import sparkSession.implicits._

    val outputDataFrame = sparkSession.read.option("multiline", true).option("mode","PERMISSIVE")
      .json(Seq(input).toDS)
        .withColumn("APP", explode(col("APP")))
      .withColumn("ASP", explode(col("ASP")))
      .withColumn("TVD", explode(col("TVD")))
        .select(
          col("VIN"),col("TT"),
          col("APP").getItem("E").as("APP_E"),
          col("APP").getItem("V").as("APP_V"),
          col("ASP").getItem("E").as("ASP_E"),
          col("ASP").getItem("V").as("ASP_E"),
          col("TVD").getItem("E").as("TVD_E"),
          col("TVD").getItem("V").as("TVD_E")
        )

    outputDataFrame.show(truncate = false)

    /*
+-----------------+---+-------------+-----+-------------+------------------+-------------+--------+
|VIN              |TT |APP_E        |APP_V|ASP_E        |ASP_E             |TVD_E        |TVD_E   |
+-----------------+---+-------------+-----+-------------+------------------+-------------+--------+
|FU74HZ501740XXXXX|0  |1566799999225|44.0 |1566800009446|23.399999618530273|1566799964040|50876515|
|FU74HZ501740XXXXX|0  |1566800002758|61.0 |1566800009446|23.399999618530273|1566799964040|50876515|
+-----------------+---+-------------+-----+-------------+------------------+-------------+--------+
     */

您可以从读取json文件开始:

val inputDataFrame:DataFrame=sparkSession
阅读
.选项(“多行”,真)
.json(yourJsonPath)
然后您可以创建一个简单的规则来获取
APP、ASP、ATO
,因为它是输入中唯一具有struct数据类型的字段:

val inputDataFrameFields:Array[StructField]=inputDataFrame.schema.fields
var snColumn=new Array[String](inputDataFrame.schema.length)

对于(x Hello@SimbaPK,我不需要结构化格式的数据。我需要JSON格式的数据,如预期结果所示。如有任何帮助,将不胜感激。