Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用关系运算符作为变量过滤Spark scala中的数据帧?_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

如何使用关系运算符作为变量过滤Spark scala中的数据帧?

如何使用关系运算符作为变量过滤Spark scala中的数据帧?,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我有一个数据帧,如下所示: myDF: +-----+ |value| +-----+ |8 | |8 | |1 | +-----+ 程序读取其他计算数据帧,并获得以下两个值: val attr = 5 val opr = > 现在我需要根据这些值过滤myDF。因此,我的结果如下: resultDF: +-----+----------+ |value|result | +-----+----------+ |8 |GOOD | |8 |

我有一个数据帧,如下所示:

myDF:

+-----+
|value|
+-----+
|8    |
|8    |
|1    |
+-----+
程序读取其他计算数据帧,并获得以下两个值:

val attr = 5
val opr = >
现在我需要根据这些值过滤myDF。因此,我的结果如下:

resultDF:
+-----+----------+
|value|result    |
+-----+----------+
|8    |GOOD      |
|8    |GOOD      |
|1    |BAD       |
+-----+----------+
我使用的代码:

val resultDF = myDF.withColumn("result", when(col("value") > attr, "GOOD").otherwise("BAD"))
现在,attr和opr将动态变化。这意味着操作符可以是
>、=、中的任何一个。首先,正如@所说的,使用动态sql没有很大的理由是不好的,因为行为未定义,调试困难。
假设已将值与运算符dataframe联接,则可以使用以下代码:

import spark.implicits._

val appData: DataFrame = Seq(
  ("1", ">"),
  ("1", ">"),
  ("3", "<="),
  ("4", "<>"),
  ("6", ">="),
  ("6", "==")
).toDF("value", "operator")

val attr = 5

def compare(value: String, operator: String, sample: Int): String = {
  val isValueCorrectForAttr: Boolean = operator match {
    case ">" => value.toInt > sample
    case "<" => value.toInt < sample
    case ">=" => value.toInt >= sample
    case "<=" => value.toInt <= sample
    case "==" => value.toInt == sample
    case "<>" => value.toInt != sample
    case _ => throw new IllegalArgumentException(s"Wrong operator: $operator")
  }
  if (isValueCorrectForAttr) "GOOD" else "BAD"
}

import org.apache.spark.sql.functions._
val dynamic_compare =  spark.udf.register("dynamic_compare", (v: String, op: String) => compare(v, op, attr))
appData.withColumn("result", dynamic_compare(col("value"), col("operator")))
导入spark.implicits_
val appData:DataFrame=Seq(
("1", ">"),
("1", ">"),
("3", "="),
("6", "==")
).toDF(“值”、“运算符”)
val attr=5
def比较(值:字符串、运算符:字符串、示例:Int):字符串={
val IsValueCorrectorforAttr:Boolean=运算符匹配{
案例“>”=>value.toInt>sample
案例“=”=>value.toInt>=样本
案例“”
def比较(值:字符串、运算符:字符串、示例:Int):字符串={
val IsValueCorrectorforAttr:Boolean=运算符匹配{
案例“>”=>value.toInt>sample
案例“=”=>value.toInt>=样本

案例“你需要使用动态SQL。关于Spark和动态SQL,有很多关于SO的问题。嗨,Andrew,谢谢你的建议。我想在Spark数据帧函数上这样做。你能分享一些SO链接吗?我们可以使用类似于这个问题的链接。谢谢@Boris,我正在测试这个代码片段。不过我得到了一个”缺少参数类型“val dynamic_compare=spark.udf.register中的错误”(“dynamic_compare”,(v,op)=>compare(v,op,attr))@scalauser I更新了lambda表达式中的加值Sring类型它按预期工作。感谢您的帮助。
import spark.implicits._

val appData: DataFrame = Seq(
  ("1", ">"),
  ("1", ">"),
  ("3", "<="),
  ("4", "<>"),
  ("6", ">="),
  ("6", "==")
).toDF("value", "operator")

val attr = 5

def compare(value: String, operator: String, sample: Int): String = {
  val isValueCorrectForAttr: Boolean = operator match {
    case ">" => value.toInt > sample
    case "<" => value.toInt < sample
    case ">=" => value.toInt >= sample
    case "<=" => value.toInt <= sample
    case "==" => value.toInt == sample
    case "<>" => value.toInt != sample
    case _ => throw new IllegalArgumentException(s"Wrong operator: $operator")
  }
  if (isValueCorrectForAttr) "GOOD" else "BAD"
}

import org.apache.spark.sql.functions._
val dynamic_compare =  spark.udf.register("dynamic_compare", (v: String, op: String) => compare(v, op, attr))
appData.withColumn("result", dynamic_compare(col("value"), col("operator")))
import spark.implicits._

val appData: DataFrame = Seq(
  "1",
  "1",
  "3",
  "4",
  "6",
  "6"
).toDF("value")

val attr = 5
val op = ">"

def compare(value: String, operator: String, sample: Int): String = {
  val isValueCorrectForAttr: Boolean = operator match {
    case ">" => value.toInt > sample
    case "<" => value.toInt < sample
    case ">=" => value.toInt >= sample
    case "<=" => value.toInt <= sample
    case "==" => value.toInt == sample
    case "<>" => value.toInt != sample
    case _ => throw new IllegalArgumentException(s"Wrong operator: $operator")
  }
  if (isValueCorrectForAttr) "GOOD" else "BAD"
}

import org.apache.spark.sql.functions._
val dynamic_compare =  spark.udf.register("dynamic_compare", (value: String) => compare(value, op, attr))
appData.withColumn("result", dynamic_compare(col("value")))