如何使用关系运算符作为变量过滤Spark scala中的数据帧?
我有一个数据帧,如下所示:如何使用关系运算符作为变量过滤Spark scala中的数据帧?,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我有一个数据帧,如下所示: myDF: +-----+ |value| +-----+ |8 | |8 | |1 | +-----+ 程序读取其他计算数据帧,并获得以下两个值: val attr = 5 val opr = > 现在我需要根据这些值过滤myDF。因此,我的结果如下: resultDF: +-----+----------+ |value|result | +-----+----------+ |8 |GOOD | |8 |
myDF:
+-----+
|value|
+-----+
|8 |
|8 |
|1 |
+-----+
程序读取其他计算数据帧,并获得以下两个值:
val attr = 5
val opr = >
现在我需要根据这些值过滤myDF。因此,我的结果如下:
resultDF:
+-----+----------+
|value|result |
+-----+----------+
|8 |GOOD |
|8 |GOOD |
|1 |BAD |
+-----+----------+
我使用的代码:
val resultDF = myDF.withColumn("result", when(col("value") > attr, "GOOD").otherwise("BAD"))
现在,attr和opr将动态变化。这意味着操作符可以是>、=、中的任何一个。首先,正如@所说的,使用动态sql没有很大的理由是不好的,因为行为未定义,调试困难。
假设已将值与运算符dataframe联接,则可以使用以下代码:
import spark.implicits._
val appData: DataFrame = Seq(
("1", ">"),
("1", ">"),
("3", "<="),
("4", "<>"),
("6", ">="),
("6", "==")
).toDF("value", "operator")
val attr = 5
def compare(value: String, operator: String, sample: Int): String = {
val isValueCorrectForAttr: Boolean = operator match {
case ">" => value.toInt > sample
case "<" => value.toInt < sample
case ">=" => value.toInt >= sample
case "<=" => value.toInt <= sample
case "==" => value.toInt == sample
case "<>" => value.toInt != sample
case _ => throw new IllegalArgumentException(s"Wrong operator: $operator")
}
if (isValueCorrectForAttr) "GOOD" else "BAD"
}
import org.apache.spark.sql.functions._
val dynamic_compare = spark.udf.register("dynamic_compare", (v: String, op: String) => compare(v, op, attr))
appData.withColumn("result", dynamic_compare(col("value"), col("operator")))
导入spark.implicits_
val appData:DataFrame=Seq(
("1", ">"),
("1", ">"),
("3", "="),
("6", "==")
).toDF(“值”、“运算符”)
val attr=5
def比较(值:字符串、运算符:字符串、示例:Int):字符串={
val IsValueCorrectorforAttr:Boolean=运算符匹配{
案例“>”=>value.toInt>sample
案例“=”=>value.toInt>=样本
案例“”
def比较(值:字符串、运算符:字符串、示例:Int):字符串={
val IsValueCorrectorforAttr:Boolean=运算符匹配{
案例“>”=>value.toInt>sample
案例“=”=>value.toInt>=样本
案例“你需要使用动态SQL。关于Spark和动态SQL,有很多关于SO的问题。嗨,Andrew,谢谢你的建议。我想在Spark数据帧函数上这样做。你能分享一些SO链接吗?我们可以使用类似于这个问题的链接。谢谢@Boris,我正在测试这个代码片段。不过我得到了一个”缺少参数类型“val dynamic_compare=spark.udf.register中的错误”(“dynamic_compare”,(v,op)=>compare(v,op,attr))@scalauser I更新了lambda表达式中的加值Sring类型它按预期工作。感谢您的帮助。
import spark.implicits._
val appData: DataFrame = Seq(
("1", ">"),
("1", ">"),
("3", "<="),
("4", "<>"),
("6", ">="),
("6", "==")
).toDF("value", "operator")
val attr = 5
def compare(value: String, operator: String, sample: Int): String = {
val isValueCorrectForAttr: Boolean = operator match {
case ">" => value.toInt > sample
case "<" => value.toInt < sample
case ">=" => value.toInt >= sample
case "<=" => value.toInt <= sample
case "==" => value.toInt == sample
case "<>" => value.toInt != sample
case _ => throw new IllegalArgumentException(s"Wrong operator: $operator")
}
if (isValueCorrectForAttr) "GOOD" else "BAD"
}
import org.apache.spark.sql.functions._
val dynamic_compare = spark.udf.register("dynamic_compare", (v: String, op: String) => compare(v, op, attr))
appData.withColumn("result", dynamic_compare(col("value"), col("operator")))
import spark.implicits._
val appData: DataFrame = Seq(
"1",
"1",
"3",
"4",
"6",
"6"
).toDF("value")
val attr = 5
val op = ">"
def compare(value: String, operator: String, sample: Int): String = {
val isValueCorrectForAttr: Boolean = operator match {
case ">" => value.toInt > sample
case "<" => value.toInt < sample
case ">=" => value.toInt >= sample
case "<=" => value.toInt <= sample
case "==" => value.toInt == sample
case "<>" => value.toInt != sample
case _ => throw new IllegalArgumentException(s"Wrong operator: $operator")
}
if (isValueCorrectForAttr) "GOOD" else "BAD"
}
import org.apache.spark.sql.functions._
val dynamic_compare = spark.udf.register("dynamic_compare", (value: String) => compare(value, op, attr))
appData.withColumn("result", dynamic_compare(col("value")))