如何在java中展平spark数据集中的包装数组
使用Spark 2.2 Java 1.8 我需要收集一组数组列。但它给我带来了痛苦。请看下面如何在java中展平spark数据集中的包装数组,java,apache-spark,Java,Apache Spark,使用Spark 2.2 Java 1.8 我需要收集一组数组列。但它给我带来了痛苦。请看下面 Dataset<Row> df2 = df.groupBy("id").agg(collect_list("values")) df2.show(truncate=False) # +-----+----------------------------------------------+ # |id| collect_list(values
Dataset<Row> df2 = df.groupBy("id").agg(collect_list("values"))
df2.show(truncate=False)
# +-----+----------------------------------------------+
# |id| collect_list(values) |
# +-----+----------------------------------------------+
# |1 |[WrappedArray(1, 2, 3), WrappedArray(4, 5, 6)]|
# |2 |[WrappedArray(2), WrappedArray(3)] |
# +-----+----------------------------------------------+
Expected output : =
# +-----+------------------+
# |store| values |
# +-----+------------------+
# |1 |[1, 2, 3, 4, 5, 6]|
# |2 |[2, 3] |
# +-----+------------------+
Dataset df2=df.groupBy(“id”).agg(收集列表(“值”))
df2.show(truncate=False)
# +-----+----------------------------------------------+
#| id |收集|列表(值)|
# +-----+----------------------------------------------+
#| 1 |[WrappedArray(1,2,3),WrappedArray(4,5,6)]|
#| 2 |[WrappedArray(2),WrappedArray(3)]|
# +-----+----------------------------------------------+
预期产量:=
# +-----+------------------+
#|存储|值|
# +-----+------------------+
# |1 |[1, 2, 3, 4, 5, 6]|
# |2 |[2, 3] |
# +-----+------------------+
如何在spark java中实现上述输出。有人能帮忙吗?谢谢。分组前可以使用“爆炸”功能:
df.withColumn("values", explode($"values")).groupBy("id").agg(collect_list($"values"))
下面是使用UDF(而不是java)的scala等价物: 输出:
+-----+----------------------------------------------+-------------+
|store|values |values_new |
+-----+----------------------------------------------+-------------+
|1 |[WrappedArray(1, 2, 3), WrappedArray(4, 5, 6)]|[1,2,3,4,5,6]|
|2 |[WrappedArray(2), WrappedArray(3)] |[2,3] |
+-----+----------------------------------------------+-------------+
希望这有帮助 爆炸是一项昂贵的操作。这需要更多的时间。可以有不同的方法吗?谢谢
+-----+----------------------------------------------+-------------+
|store|values |values_new |
+-----+----------------------------------------------+-------------+
|1 |[WrappedArray(1, 2, 3), WrappedArray(4, 5, 6)]|[1,2,3,4,5,6]|
|2 |[WrappedArray(2), WrappedArray(3)] |[2,3] |
+-----+----------------------------------------------+-------------+