Scala 转换字符串序列以连接列
我有以下序列和数据帧:Scala 转换字符串序列以连接列,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我有以下序列和数据帧: df1.select("link1", "link2").show +-----+-----+ |link1|link2| +-----+-----+ | 1| 1| | 2| 1| | 2| 1| | 3| 1| | 5| 2| +-----+-----+ df2.select("link1_2", "link2_2").show +-------+-------+ |link1_2|link2_2| +--
df1.select("link1", "link2").show
+-----+-----+
|link1|link2|
+-----+-----+
| 1| 1|
| 2| 1|
| 2| 1|
| 3| 1|
| 5| 2|
+-----+-----+
df2.select("link1_2", "link2_2").show
+-------+-------+
|link1_2|link2_2|
+-------+-------+
| 2| 1|
| 2| 4|
| 4| 1|
| 5| 2|
| 3| 4|
+-------+-------+
val col_names = Seq("link1", "link2")
我想创建以下链接
df1.join(df2, 'link1 === 'link1_2 && 'link2 === 'link1_2)
没有硬编码的链接列。我基本上需要一种方法来进行以下转换:
Seq("str1", "str2", ...) -> 'str1 === 'str1_2 && 'str2 === 'str1_2 && ...
我尝试了以下方法,但似乎不起作用:
df1.join(df2, col_names map (str: String => col(str) === col(str + "_2")).foldLeft(true)(_ && _))
有人知道如何编写上述转换吗?不需要遍历列列表两次。只需使用foldLeft,如下所示:
import org.apache.spark.sql.functions._
import spark.implicits._
val df1 = Seq(
(1, 1), (2, 1), (2, 1), (3, 1), (5, 2)
).toDF("c1", "c2")
val df2 = Seq(
(2, 1), (2, 4), (4, 1), (5, 2), (3, 4)
).toDF("c1_2", "c2_2")
val cols = Seq("c1", "c2")
df1.
join(df2, cols.foldLeft(lit(true))((cond, c) => cond && col(c) === col(c + "_2"))).
show
//+---+---+----+----+
//| c1| c2|c1_2|c2_2|
//+---+---+----+----+
//| 2| 1| 2| 1|
//| 2| 1| 2| 1|
//| 5| 2| 5| 2|
//+---+---+----+----+