在Spark Scala中将多列分解为单独的行
我有一个DF在下面的结构中在Spark Scala中将多列分解为单独的行,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我有一个DF在下面的结构中 Col1. Col2 Col3 Data1Col1,Data2Col1. Data1Col2,Data2Col2. Data1Col3,Data2Col3 我希望结果数据集为以下类型: Col1 Col2 Col3 Data1Col1. Data1Col2. Data1Col3 Data2Col1. Data2Col2 Data
Col1. Col2 Col3
Data1Col1,Data2Col1. Data1Col2,Data2Col2. Data1Col3,Data2Col3
我希望结果数据集为以下类型:
Col1 Col2 Col3
Data1Col1. Data1Col2. Data1Col3
Data2Col1. Data2Col2 Data2Col3
请建议我如何处理这个问题。我尝试过分解,但结果是重复的行
val df = Seq(("C,D,E,F","M,N,O,P","K,P,B,P")).toDF("Col1","Col2","Col3")
df.show
+-------+-------+-------+
| Col1| Col2| Col3|
+-------+-------+-------+
|C,D,E,F|M,N,O,P|K,P,B,P|
+-------+-------+-------+
val res1 = df.withColumn("Col1",split(col("Col1"),",")).withColumn("Col2",split(col("Col2"),",")).withColumn("Col3",split(col("Col3"),","))
res1.show
+------------+------------+------------+
| Col1| Col2| Col3|
+------------+------------+------------+
|[C, D, E, F]|[M, N, O, P]|[K, P, B, P]|
+------------+------------+------------+
val zip = udf((x: Seq[String], y: Seq[String], z: Seq[String]) => z.zip(x.zip(y)))
val res14 = res1.withColumn("test",explode(zip(col("Col1"),col("Col2"),col("Col3")))).show
+------------+------------+------------+-----------+
| Col1| Col2| Col3| test|
+------------+------------+------------+-----------+
|[C, D, E, F]|[M, N, O, P]|[K, P, B, P]|[K, [C, M]]|
|[C, D, E, F]|[M, N, O, P]|[K, P, B, P]|[P, [D, N]]|
|[C, D, E, F]|[M, N, O, P]|[K, P, B, P]|[B, [E, O]]|
|[C, D, E, F]|[M, N, O, P]|[K, P, B, P]|[P, [F, P]]|
+------------+------------+------------+-----------+
res14.withColumn("t3",col("test._1")).withColumn("tn",col("test._2")).withColumn("t2",col("tn._2")).withColumn("t1",col("tn._1")).select("t1","t2","t3").show
+---+---+---+
| t1| t2| t3|
+---+---+---+
| C| M| K|
| D| N| P|
| E| O| B|
| F| P| P|
+---+---+---+
res1-初始数据帧
res14-中间Df为什么会有一些伪点?它们相关吗?