使用Spark/Scala将嵌套JSON转换为数据帧
我有一个嵌套的JSON,在这里我需要转换成扁平的数据帧,而不需要定义或分解其中的任何列名使用Spark/Scala将嵌套JSON转换为数据帧,scala,apache-spark,apache-spark-sql,spark-streaming,Scala,Apache Spark,Apache Spark Sql,Spark Streaming,我有一个嵌套的JSON,在这里我需要转换成扁平的数据帧,而不需要定义或分解其中的任何列名 val df=sqlCtx.read.option(“multiLine”,true).json(“test.json”) 这就是我的数据的样子: [ { "symbol": “TEST3", "timestamp": "2019-05-07 16:00:00", "priceData": { "open": "1177.2600", "high": "
val df=sqlCtx.read.option(“multiLine”,true).json(“test.json”)
这就是我的数据的样子:
[
{
"symbol": “TEST3",
"timestamp": "2019-05-07 16:00:00",
"priceData": {
"open": "1177.2600",
"high": "1179.5500",
"low": "1176.6700",
"close": "1179.5500",
"volume": "49478"
}
},
{
"symbol": “TEST4",
"timestamp": "2019-05-07 16:00:00",
"priceData": {
"open": "189.5660",
"high": "189.9100",
"low": "189.5100",
"close": "189.9100",
"volume": "267986"
}
}
]
下面是一种使用
DataFrameFlatter
类的方法,该类由以下人员实现:
以及输出:
+---------------+--------------+-------------+--------------+----------------+------+-------------------+
|priceData.close|priceData.high|priceData.low|priceData.open|priceData.volume|symbol| timestamp|
+---------------+--------------+-------------+--------------+----------------+------+-------------------+
| 1179.5500| 1179.5500| 1176.6700| 1177.2600| 49478| TEST3|2019-05-07 16:00:00|
| 189.9100| 189.9100| 189.5100| 189.5660| 267986| TEST4|2019-05-07 16:00:00|
+---------------+--------------+-------------+--------------+----------------+------+-------------------+
或者,您可以只执行普通选择:
df.select(
"priceData.close",
"priceData.high",
"priceData.low",
"priceData.open",
"priceData.volume",
"symbol",
"timestamp").show
输出:
+---------+---------+---------+---------+------+------+-------------------+
| close| high| low| open|volume|symbol| timestamp|
+---------+---------+---------+---------+------+------+-------------------+
|1179.5500|1179.5500|1176.6700|1177.2600| 49478| TEST3|2019-05-07 16:00:00|
| 189.9100| 189.9100| 189.5100| 189.5660|267986| TEST4|2019-05-07 16:00:00|
+---------+---------+---------+---------+------+------+-------------------+
您使用的是Spark的哪个版本?Spark版本=2.3.0
+---------+---------+---------+---------+------+------+-------------------+
| close| high| low| open|volume|symbol| timestamp|
+---------+---------+---------+---------+------+------+-------------------+
|1179.5500|1179.5500|1176.6700|1177.2600| 49478| TEST3|2019-05-07 16:00:00|
| 189.9100| 189.9100| 189.5100| 189.5660|267986| TEST4|2019-05-07 16:00:00|
+---------+---------+---------+---------+------+------+-------------------+