如何将数据帧中的行分组到由分隔符Scala Spark分隔的单行中?
我有Spark的数据框:如何将数据帧中的行分组到由分隔符Scala Spark分隔的单行中?,scala,dataframe,apache-spark,pyspark,Scala,Dataframe,Apache Spark,Pyspark,我有Spark的数据框: +-------------+ |father|child | +-------------+ |Aaron |Adam | |Aaron |Berel | |Aaron |Kasper| |Levi |Saul | |Levi |Tiger | +-------------+ 如何按父项分组并将所有数据放在一个带分隔符的字段中 我期望的结果是: +------------------------+ |union_all_name_by_father| +--
+-------------+
|father|child |
+-------------+
|Aaron |Adam |
|Aaron |Berel |
|Aaron |Kasper|
|Levi |Saul |
|Levi |Tiger |
+-------------+
如何按父项分组并将所有数据放在一个带分隔符的字段中
我期望的结果是:
+------------------------+
|union_all_name_by_father|
+------------------------+
|Aaron;Adam;Berel;Kasper |
|Levi;Saul;Tiger |
+------------------------+
您可以使用
groupby
,然后使用concat\ws
:
val df2 = df.groupBy("father").agg(
concat_ws(";", collect_list(col("child"))).as("col2")
).select(concat_ws(";", col("father"), col("col2")).as("union_all_name_by_father"))
df2.show(false)
+------------------------+
|union_all_name_by_father|
+------------------------+
|Aaron;Adam;Berel;Kasper |
|Levi;Saul;Tiger |
+------------------------+
在斯卡拉,我什么都没试过,我不知道该走哪条路