Apache spark Json数据的Spark_Apache Spark_Apache Spark Sql

Apache spark Json数据的Spark

apache-spark

Apache spark Json数据的Spark,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我正在处理一个嵌套的复杂Json，下面是它的模式 root |-- businessEntity: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- payGroup: array (nullable = true) | | | |-- element: struct (containsNull = true) | | | |

我正在处理一个嵌套的复杂Json，下面是它的模式

root
 |-- businessEntity: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- payGroup: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- reportingPeriod: struct (nullable = true)
 |    |    |    |    |    |-- worker: array (nullable = true)
 |    |    |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |    |    |-- category: string (nullable = true)
 |    |    |    |    |    |    |    |-- person: struct (nullable = true)
 |    |    |    |    |    |    |    |-- tax: array (nullable = true)
 |    |    |    |    |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |    |    |    |    |-- code: string (nullable = true)
 |    |    |    |    |    |    |    |    |    |-- qtdAmount: double (nullable = true)
 |    |    |    |    |    |    |    |    |    |-- ytdAmount: double (nullable =

我的需求是创建一个hashmap，其中代码以qtdAmount作为键，qtdAmount的值作为值。 Map.put（代码+“qtdAmount”，qtdAmount）。我怎样才能用spark做到这一点

我试过使用下面的shell命令

import org.apache.spark.sql._
val sqlcontext = new SQLContext(sc)
val cdm = sqlcontext.read.json("/user/edureka/CDM/cdm.json")
val spark = SparkSession.builder().appName("SQL").config("spark.some.config.option","some-vale").getOrCreate()
cdm.createOrReplaceTempView("CDM")
val sqlDF = spark.sql("SELECT businessEntity[0].payGroup[0] from CDM").show()
val address = spark.sql("SELECT businessEntity[0].payGroup[0].reportingPeriod.worker[0].person.address from CDM as address")
val worker = spark.sql("SELECT businessEntity[0].payGroup[0].reportingPeriod.worker from CDM")
val tax = spark.sql("SELECT businessEntity[0].payGroup[0].reportingPeriod.worker[0].tax from CDM")
val tax = sqlcontext.sql("SELECT businessEntity[0].payGroup[0].reportingPeriod.worker[0].tax from CDM")
tax.select("tax.code")


val codes = tax.select(expode(tax("code"))
scala> val codes = tax.withColumn("code",explode(tax("tax.code"))).withColumn("qtdAmount",explode(tax("tax.qtdAmount"))).withColumn("ytdAmount",explode(tax("tax.ytdAmount")))

我正试图把所有的代码和qtdAmount放到地图上。但我不明白。对单个DF使用多个explode语句会生成元素的笛卡尔乘积

有人能帮我解析spark中如此复杂的json吗。

你可以通过这种方式获得

code

和

qtymount

import sqlcontext.implicits._

     cdm.select(
        $"businessEntity.element.payGroup.element.reportingPeriod.worker.element.tax.element.code".as("code"),
        $"businessEntity.element.payGroup.element.reportingPeriod.worker.element.tax.element.qtdAmount".as("qtdAmount")
      ).show

有关详细信息，请查看