Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Spark/Scala将嵌套JSON转换为数据帧_Scala_Apache Spark_Apache Spark Sql_Spark Streaming - Fatal编程技术网

使用Spark/Scala将嵌套JSON转换为数据帧

使用Spark/Scala将嵌套JSON转换为数据帧,scala,apache-spark,apache-spark-sql,spark-streaming,Scala,Apache Spark,Apache Spark Sql,Spark Streaming,我有一个嵌套的JSON,在这里我需要转换成扁平的数据帧,而不需要定义或分解其中的任何列名 val df=sqlCtx.read.option(“multiLine”,true).json(“test.json”) 这就是我的数据的样子: [ { "symbol": “TEST3", "timestamp": "2019-05-07 16:00:00", "priceData": { "open": "1177.2600", "high": "

我有一个嵌套的JSON,在这里我需要转换成扁平的数据帧,而不需要定义或分解其中的任何列名

val df=sqlCtx.read.option(“multiLine”,true).json(“test.json”)
这就是我的数据的样子:

[
  {
    "symbol": “TEST3",
    "timestamp": "2019-05-07 16:00:00",
    "priceData": {
      "open": "1177.2600",
      "high": "1179.5500",
      "low": "1176.6700",
      "close": "1179.5500",
      "volume": "49478"
    }
  },
  {
    "symbol": “TEST4",
    "timestamp": "2019-05-07 16:00:00",
    "priceData": {
      "open": "189.5660",
      "high": "189.9100",
      "low": "189.5100",
      "close": "189.9100",
      "volume": "267986"
    }
  }
]

下面是一种使用
DataFrameFlatter
类的方法,该类由以下人员实现:

以及输出:

+---------------+--------------+-------------+--------------+----------------+------+-------------------+
|priceData.close|priceData.high|priceData.low|priceData.open|priceData.volume|symbol|          timestamp|
+---------------+--------------+-------------+--------------+----------------+------+-------------------+
|      1179.5500|     1179.5500|    1176.6700|     1177.2600|           49478| TEST3|2019-05-07 16:00:00|
|       189.9100|      189.9100|     189.5100|      189.5660|          267986| TEST4|2019-05-07 16:00:00|
+---------------+--------------+-------------+--------------+----------------+------+-------------------+
或者,您可以只执行普通选择:

df.select(
  "priceData.close", 
  "priceData.high", 
  "priceData.low", 
  "priceData.open", 
  "priceData.volume", 
  "symbol", 
  "timestamp").show
输出:

+---------+---------+---------+---------+------+------+-------------------+
|    close|     high|      low|     open|volume|symbol|          timestamp|
+---------+---------+---------+---------+------+------+-------------------+
|1179.5500|1179.5500|1176.6700|1177.2600| 49478| TEST3|2019-05-07 16:00:00|
| 189.9100| 189.9100| 189.5100| 189.5660|267986| TEST4|2019-05-07 16:00:00|
+---------+---------+---------+---------+------+------+-------------------+

您使用的是Spark的哪个版本?Spark版本=2.3.0
+---------+---------+---------+---------+------+------+-------------------+
|    close|     high|      low|     open|volume|symbol|          timestamp|
+---------+---------+---------+---------+------+------+-------------------+
|1179.5500|1179.5500|1176.6700|1177.2600| 49478| TEST3|2019-05-07 16:00:00|
| 189.9100| 189.9100| 189.5100| 189.5660|267986| TEST4|2019-05-07 16:00:00|
+---------+---------+---------+---------+------+------+-------------------+