在scala spark中如何减少多重情况_Scala_Apache Spark_Case When

在scala spark中如何减少多重情况

scala apache-spark

在scala spark中如何减少多重情况,scala,apache-spark,case-when,Scala,Apache Spark,Case When,新手问题，如何优化/减少以下表达式： when(x1._1,x1._2).when(x2._1,x2._2).when(x3._1,x3._2).when(x4._1,x4._2).when(x5._1,x5._2).... .when(xX._1,xX._2).otherwise(z) x1，x2，x3，xX是映射，其中x1._1是条件，x._2是条件我试图将地图保存在列表中，然后使用map reduce，但它生成了一个： when(x1._1,x1._2).otherwise(z) &a

新手问题，如何优化/减少以下表达式：

when(x1._1,x1._2).when(x2._1,x2._2).when(x3._1,x3._2).when(x4._1,x4._2).when(x5._1,x5._2)....
.when(xX._1,xX._2).otherwise(z)

x1，x2，x3，xX是映射，其中x1._1是条件，x._2是条件

我试图将地图保存在列表中，然后使用map reduce，但它生成了一个：

when(x1._1,x1._2).otherwise(z) && when(x2._1,x2._2).otherwise(z)...

这是错误的。我有10行纯when case，希望减少这一行，使我的代码更清晰。

您可以在地图列表上使用foldLeft：

val maplist = List(x1, x2)  // add more x if needed

val new_col = maplist.tail.foldLeft(when(maplist.head._1, maplist.head._2))((x,y) => x.when(y._1, y._2)).otherwise(z)

另一种方法是使用coalesce。如果不满足该条件，when语句将返回null，并且将计算下一个when语句，直到获得非null结果

val new_col = coalesce((maplist.map(x => when(x._1, x._2)) :+ z):_*)

您可以在地图列表上使用foldLeft：

val maplist = List(x1, x2)  // add more x if needed

val new_col = maplist.tail.foldLeft(when(maplist.head._1, maplist.head._2))((x,y) => x.when(y._1, y._2)).otherwise(z)

另一种方法是使用coalesce。如果不满足该条件，when语句将返回null，并且将计算下一个when语句，直到获得非null结果

val new_col = coalesce((maplist.map(x => when(x._1, x._2)) :+ z):_*)

另一种方法是将“否则”作为foldLeft的初始值传递：

您可以创建一个简单的递归方法来组装嵌套的when/other条件：

import org.apache.spark.sql.Column

def nestedCond(cols: Array[String], default: String): Column = {
  def loop(ls: List[String]): Column = ls match {
    case Nil => col(default)
    case c :: tail => when(col(s"$c._1"), col(s"$c._2")).otherwise(loop(tail))
  }
  loop(cols.toList).as("nested-cond")
}

def nestedCond(cols: Array[String], default: String): Column =
  cols.foldRight(col(default)){ (c, acc) =>
      when(col(s"$c._1"), col(s"$c._2")).otherwise(acc)
    }.as("nested-cond")

测试方法：

val df = Seq(
  ((false, 1), (false, 2), (true, 3), 88),
  ((false, 4), (true, 5), (true, 6), 99)
).toDF("x1", "x2", "x3", "z")

val cols = df.columns.filter(_.startsWith("x"))
// cols: Array[String] = Array(x1, x2, x3)

df.select(nestedCond(cols, "z")).show
// +-----------+
// |nested-cond|
// +-----------+
// |          3|
// |          5|
// +-----------+

或者，使用foldRight组合嵌套条件：

import org.apache.spark.sql.Column

def nestedCond(cols: Array[String], default: String): Column = {
  def loop(ls: List[String]): Column = ls match {
    case Nil => col(default)
    case c :: tail => when(col(s"$c._1"), col(s"$c._2")).otherwise(loop(tail))
  }
  loop(cols.toList).as("nested-cond")
}

def nestedCond(cols: Array[String], default: String): Column =
  cols.foldRight(col(default)){ (c, acc) =>
      when(col(s"$c._1"), col(s"$c._2")).otherwise(acc)
    }.as("nested-cond")

您可以创建一个简单的递归方法来组装嵌套的when/other条件：

import org.apache.spark.sql.Column

def nestedCond(cols: Array[String], default: String): Column = {
  def loop(ls: List[String]): Column = ls match {
    case Nil => col(default)
    case c :: tail => when(col(s"$c._1"), col(s"$c._2")).otherwise(loop(tail))
  }
  loop(cols.toList).as("nested-cond")
}

def nestedCond(cols: Array[String], default: String): Column =
  cols.foldRight(col(default)){ (c, acc) =>
      when(col(s"$c._1"), col(s"$c._2")).otherwise(acc)
    }.as("nested-cond")

测试方法：

val df = Seq(
  ((false, 1), (false, 2), (true, 3), 88),
  ((false, 4), (true, 5), (true, 6), 99)
).toDF("x1", "x2", "x3", "z")

val cols = df.columns.filter(_.startsWith("x"))
// cols: Array[String] = Array(x1, x2, x3)

df.select(nestedCond(cols, "z")).show
// +-----------+
// |nested-cond|
// +-----------+
// |          3|
// |          5|
// +-----------+

或者，使用foldRight组合嵌套条件：

import org.apache.spark.sql.Column

def nestedCond(cols: Array[String], default: String): Column = {
  def loop(ls: List[String]): Column = ls match {
    case Nil => col(default)
    case c :: tail => when(col(s"$c._1"), col(s"$c._2")).otherwise(loop(tail))
  }
  loop(cols.toList).as("nested-cond")
}

def nestedCond(cols: Array[String], default: String): Column =
  cols.foldRight(col(default)){ (c, acc) =>
      when(col(s"$c._1"), col(s"$c._2")).otherwise(acc)
    }.as("nested-cond")

非常感谢，今天下午我将试一试。更新：两种解决方案都有效，谢谢！我将尝试使用替代的coalesce运行更多的测试，但到目前为止还不错。非常感谢，今天下午我将尝试一下。更新：两种解决方案都有效，谢谢！我将尝试使用替代的coalesce运行更多的测试，但到目前为止还不错。