Apache spark 如何将列聚合到JSON数组中?

Apache spark 如何将列聚合到JSON数组中?,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我如何转换如下数据,以便在ElasticSearch中存储数据 下面是一个bean的数据集,我将按产品将其聚合到一个JSON数组中 List<Bean> data = new ArrayList<Bean>(); data.add(new Bean("book","John",59)); data.add(new Bean("book","Björn",61)); data.add(new Bean("tv","Roger",36)); Dataset ds = spar

我如何转换如下数据,以便在ElasticSearch中存储数据

下面是一个bean的数据集,我将按产品将其聚合到一个JSON数组中

List<Bean> data = new ArrayList<Bean>();
data.add(new Bean("book","John",59));
data.add(new Bean("book","Björn",61));
data.add(new Bean("tv","Roger",36));
Dataset ds = spark.createDataFrame(data, Bean.class);

ds.show(false);

+------+-------+---------+
|amount|product|purchaser|
+------+-------+---------+
|59    |book   |John     |
|61    |book   |Björn    |
|36    |tv     |Roger    |
+------+-------+---------+


ds = ds.groupBy(col("product")).agg(collect_list(map(ds.col("purchaser"),ds.col("amount")).as("map")));
ds.show(false);

+-------+---------------------------------------------+
|product|collect_list(map(purchaser, amount) AS `map`)|
+-------+---------------------------------------------+
|tv     |[[Roger -> 36]]                              |
|book   |[[John -> 59], [Björn -> 61]]                |
+-------+---------------------------------------------+
解决方案:

ds.groupBy(col(“产品”))
.agg(collect_list(to_json)(结构(col(“买方”)、col(“金额”))。别名(“json”));

首先使用
来获取json
然后
收集列表
可能的重复项。您能告诉我您将如何做到这一点吗?你关于可能重复的链接没有涵盖json数组中的聚合功能。太好了!非常感谢你的帮助,菲兰托弗!如果可以解决问题,请随意接受您自己的答案。我又添加了几个列,这对我来说很有效,但我发现为了让“别名”起作用,我必须将其切换到
collect_list
而不是
to_json
函数,因此对我来说就像是
ds.groupBy(col(“product”).agg(collect_list)(to_json)(struct(col)(col(“买方”),col(“金额”))。别名(“json”);
+-------+------------------------------------------------------------------+
|product|json                                                              |
+-------+------------------------------------------------------------------+
|tv     |[{purchaser: "Roger", amount:36}]                                 |
|book   |[{purchaser: "John", amount:36}, {purchaser: "Björn", amount:61}] |
+-------+------------------------------------------------------------------+