Scala 基于Spark中现有的数据帧创建新的数据帧列
有两个DF,我需要在DF1中填充一个新列,在下面的条件下显示标志Scala 基于Spark中现有的数据帧创建新的数据帧列,scala,apache-spark,Scala,Apache Spark,有两个DF,我需要在DF1中填充一个新列,在下面的条件下显示标志 DF1 +------+-------------------+ ||AMOUNT|Brand | +------+-------------------+ | 47.88| Parle | | 40.92| Parle | | 83.82| Parle | |106.58|
DF1
+------+-------------------+
||AMOUNT|Brand |
+------+-------------------+
| 47.88| Parle |
| 40.92| Parle |
| 83.82| Parle |
|106.58| Parle |
| 90.51| Flipkart |
| 11.48| Flipkart |
| 18.47| Flipkart |
| 40.92| Flipkart |
| 30.0| Flipkart |
+------+-------------------+
DF2
+--------------------+-------+----------+
| Brand | P1 | P2 |
+--------------------+-------+----------+
| Parle| 37.00 | 100.15 |
| Flipkart| 10.0 | 30.0 |
+--------------------+-------+----------+
如果DF1中所述品牌参数的数量小于品牌参数在DF2中的P1值金额expected output
+------+-------------------+----------------+
||AMOUNT|Brand | Flag |
+------+-------------------+----------------+
| 47.88| Parle | mid |
| 40.92| Parle | mid |
| 83.82| Parle | mid |
|106.58| Parle | high |
| 90.51| Flipkart | high |
| 11.48| Flipkart | mid |
| 18.47| Flipkart | mid |
| 40.92| Flipkart | high |
| 30.0| Flipkart | mid |
+------+-------------------+----------------
我知道我可以进行连接并得到结果,但我应该如何在spark中构建逻辑框架 简单的左连接和嵌套内置函数应该可以得到您想要的结果
import org.apache.spark.sql.functions._
df1.join(df2, Seq("Brand"), "left")
.withColumn("Flag", when(col("AMOUNT") < col("P1"), "low").otherwise(
when(col("AMOUNT") >= col("P1") && col("AMOUNT") <= col("P2"), "mid").otherwise(
when(col("AMOUNT") > col("P2"), "high").otherwise("unknown"))))
.select("AMOUNT", "Brand", "Flag")
.show(false)
我希望答案是有帮助的我认为使用udf也是可行的
val df3 = df1.join(df2, Seq("Brand"), "left")
import org.apache.spark.sql.functions._
val mapper = udf((amount: Double, p1: Double, p2: Double) => if (amount < p1) "low" else if (amount > p2) "high" else "mid")
df3.withColumn("Flag", mapper(df3("AMOUNT"), df3("P1"), df3("P2")))
.select("AMOUNT", "Brand", "Flag")
.show(false)
谢谢,我试图使用广播连接,但它导致了错误。val df2=broadcastedf2.asdf2`df1.joinbroadcastedf2,seq Brand,左。使用ColumnFlag,当lamount
val df3 = df1.join(df2, Seq("Brand"), "left")
import org.apache.spark.sql.functions._
val mapper = udf((amount: Double, p1: Double, p2: Double) => if (amount < p1) "low" else if (amount > p2) "high" else "mid")
df3.withColumn("Flag", mapper(df3("AMOUNT"), df3("P1"), df3("P2")))
.select("AMOUNT", "Brand", "Flag")
.show(false)