Java 转换spark';将数据帧列转换为json对象

Java 转换spark';将数据帧列转换为json对象,java,apache-spark,apache-spark-sql,Java,Apache Spark,Apache Spark Sql,我有一个包含以下数据的数据框 +-----------+-------|-----| |file_name | key |Value| +-----------+-------+-----+ | file1 | key1 | 7 | | file1 | key2 | 11 | | file1 | key3 | 3 | | file2 | key1 | 9 | | file2 | key2 | 2 | | file2

我有一个包含以下数据的数据框

+-----------+-------|-----|
|file_name  | key   |Value|
+-----------+-------+-----+
| file1     | key1  | 7   |
| file1     | key2  | 11  |
| file1     | key3  | 3   |
| file2     | key1  | 9   |
| file2     | key2  | 2   |
| file2     | key3  | 10  |
+-----------+-------+-----+
通过以下代码,我解决了问题的一个步骤

dataset.select(col("file_name"), to_json(struct(col("key").alias("key"),col("value").alias("value"))).alias("output"))
       .groupBy(col("file_name")).agg(collect_list(col("output")).alias("output"))
       .show(false);
这给了我这样的输出-

+-----------+-------------------------------------------------------------------------------------|
|file_name  | output                                                                              |
+-----------+-------------------------------------------------------------------------------------|
| file1     |[{"key":"key1","value":"7"}, {"key":"key2","value":"11"}, {"key":"key3","value":"3"}]|
| file2     |[{"key":"key1","value":"9"}, {"key":"key2","value":"2"}, {"key":"key3","value":"10"}]|
+-----------+-------------------------------------------------------------------------------------|
但是,我希望我的最终输出采用以下json结构。您能建议我进行任何更改,以获得以下格式的输出(包含json数组的json对象)


尝试添加另一个
select
语句:
select(col(“文件名”),to_json(结构(col(“输出”)。别名(“结果”)))。别名(“输出”))

代码应该类似于:


dataset.select(col(“文件名”)to_json(结构(col(“键”)。别名(“键”)),col(“值”)。别名(“输出”))
.groupBy(col(“文件名”)).agg(coll(“输出”)).alias(“输出”))
.select(col(“文件名”)to_json(结构(col(“输出”).alias(“结果”))).alias(“输出”))
.显示(虚假);

在调用
到\u json
之前,可以将结果放入结构中。请注意,您不应该两次调用
到_json
,因为这将导致双转义引号

dataset.groupBy("file_name").agg(
    to_json(
        struct(
            collect_list(struct("key", "value")).alias("result")
        )
    ).alias("output")
).show(false)

+---------+----------------------------------------------------------------------------------------------+
|file_name|output                                                                                        |
+---------+----------------------------------------------------------------------------------------------+
|file2    |{"result":[{"key":"key1","value":"9"},{"key":"key2","value":"2"},{"key":"key3","value":"10"}]}|
|file1    |{"result":[{"key":"key1","value":"7"},{"key":"key2","value":"11"},{"key":"key3","value":"3"}]}|
+---------+----------------------------------------------------------------------------------------------+
dataset.groupBy("file_name").agg(
    to_json(
        struct(
            collect_list(struct("key", "value")).alias("result")
        )
    ).alias("output")
).show(false)

+---------+----------------------------------------------------------------------------------------------+
|file_name|output                                                                                        |
+---------+----------------------------------------------------------------------------------------------+
|file2    |{"result":[{"key":"key1","value":"9"},{"key":"key2","value":"2"},{"key":"key3","value":"10"}]}|
|file1    |{"result":[{"key":"key1","value":"7"},{"key":"key2","value":"11"},{"key":"key3","value":"3"}]}|
+---------+----------------------------------------------------------------------------------------------+