Scala 如何在JSON中聚合数组?

Scala 如何在JSON中聚合数组?,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我有一个关于在嵌套JSON数组上进行聚合的问题。我的示例order dataframe或(显示为JSON)如下所示: { "orderId": "oi1", "orderLines": [ { "productId": "p1", "quantity": 1, "sequence": 1, "totalPrice": { "gross": 50, "net": 40, "tax": 1

我有一个关于在嵌套JSON数组上进行聚合的问题。我的示例order dataframe或(显示为JSON)如下所示:

{
  "orderId": "oi1",
  "orderLines": [
    {
      "productId": "p1",
      "quantity": 1,
      "sequence": 1,
      "totalPrice": {
        "gross": 50,
        "net": 40,
        "tax": 10
      }
    },
    {
      "productId": "p2",
      "quantity": 3,
      "sequence": 2,
      "totalPrice": {
        "gross": 300,
        "net": 240,
        "tax": 60
      }
    }
  ]
}
如何使用Spark SQL对给定订单的所有行的数量求和

e、 g在这种情况下,1+3=4

我想写在下面,但没有类似于支持它的内置函数出现(除非我错过了它,这很可能!)

选择
医嘱ID,
将数组(orderLines.quantity)求和为totalQuantityItems
从…起
命令
也许需要自定义自定义自定义项(Scala)?如果是这样/有任何例子,这会是什么样子? 即使深入到嵌套中,也要对所有项求和

选择
医嘱ID,
将数组(orderLines.totalPrice.net)求和为totalOrderNet
从…起
命令
使用读取数据集

你可以用Scala的理解力让它更“漂亮”

val quantities = for {
  o <- orders
  id = o._1
  quantity <- o._2
} yield (id, quantity._2)

val sumPerOrder = quantities.
  toDF("id", "quantity"). // <-- back to DataFrames to have names
  groupBy("id").
  agg(sum("quantity") as "sum")
scala> sumPerOrder.show
+---+---+
| id|sum|
+---+---+
|oi1|  4|
+---+---+
val数量=用于{
o
val quantities = for {
  o <- orders
  id = o._1
  quantity <- o._2
} yield (id, quantity._2)

val sumPerOrder = quantities.
  toDF("id", "quantity"). // <-- back to DataFrames to have names
  groupBy("id").
  agg(sum("quantity") as "sum")
scala> sumPerOrder.show
+---+---+
| id|sum|
+---+---+
|oi1|  4|
+---+---+