Spark Scala UDF参数限制为10

Spark Scala UDF参数限制为10,scala,apache-spark,apache-spark-sql,user-defined-functions,Scala,Apache Spark,Apache Spark Sql,User Defined Functions,我需要创建一个包含11个参数的Spark UDF。有什么办法可以做到吗? 我知道我们可以创建一个最多有10个参数的UDF 下面是10个参数的代码。它起作用了 val testFunc1 = (one: String, two: String, three: String, four: String, five: String, six: String, seven: String, eight: String, nine: String, ten: String

我需要创建一个包含11个参数的Spark UDF。有什么办法可以做到吗? 我知道我们可以创建一个最多有10个参数的UDF

下面是10个参数的代码。它起作用了

val testFunc1 = (one: String, two: String, three: String, four: String,
                 five: String, six: String, seven: String, eight: String, nine: String, ten: String) => {
    if (isEmpty(four)) false
    else four match {
        case "RDIS" => three == "ST"
        case "TTSC" => nine == "UT" && eight == "RR"
        case _ => false
    }
}
import org.apache.spark.sql.functions.udf    
udf(testFunc1)
下面是11个参数的代码。面临未指定的值参数:数据类型问题

val testFunc2 = (one: String, two: String, three: String, four: String,
                 five: String, six: String, seven: String, eight: String, nine: String, ten: String, ELEVEN: String) => {
  if (isEmpty(four)) false
  else four match {
    case "RDIS" => three == "ST"
    case "TTSC" => nine == "UT" && eight == "RR" && ELEVEN == "OR"
    case _ => false
  }
}
import org.apache.spark.sql.functions.udf    
udf(testFunc2) // compilation error

您可以创建一个新列,它是一个列数组:

df.withColumns("arrCol", array("col1", "col2", "col3", ...)
现在,您可以对数组执行UDF

val testFunc(vals: Seq[String]): String = ...

我建议将参数打包到地图中:


“map litone,$1,littwo,$1”是如何工作的?它应该是类似于'map litone->$one,littwo->$one`@raam是的,key,value,key,value等等,然后我们可以写'df.withColumnudf\u result,myUDFmaplitone,$one,littwo,$one.show',而不是另外一个withcolumn这正是我的建议!谢谢你花时间写下答案+1@SiddaramH映射不需要导入
import org.apache.spark.sql.functions._

val df = sc.parallelize(Seq(("a","b"),("c","d"),("e","f"))).toDF("one","two")


val myUDF = udf((input:Map[String,String]) => {
  // do something with the input
  input("one")=="a"
})

df
  .withColumn("udf_args",map(
    lit("one"),$"one",
    lit("two"),$"one"
  )
 )
 .withColumn("udf_result", myUDF($"udf_args"))
 .show()

+---+---+--------------------+----------+
|one|two|            udf_args|udf_result|
+---+---+--------------------+----------+
|  a|  b|Map(one -> a, two...|      true|
|  c|  d|Map(one -> c, two...|     false|
|  e|  f|Map(one -> e, two...|     false|
+---+---+--------------------+----------+