Scala 连接列列表，除非其中任何列为空_Scala_Dataframe_Apache Spark

Scala 连接列列表，除非其中任何列为空

scala dataframe apache-spark

Scala 连接列列表，除非其中任何列为空,scala,dataframe,apache-spark,Scala,Dataframe,Apache Spark,我有一个dataframe，我想为其添加一个新列，它是ListOfficedColumns中使用“\ux”的列中所有项的串联。如果ListOfficedColumns中的任何列为null，我想将新列的值设置为null +---+----+---------+ | a| b| unique_id | +---+----+---------+ |foo | bar| foo_bar| |null|bar | null | |baz |null| null | |null|null

我有一个dataframe，我想为其添加一个新列，它是ListOfficedColumns中使用“\ux”的列中所有项的串联。如果ListOfficedColumns中的任何列为null，我想将新列的值设置为null

+---+----+---------+
|  a|   b| unique_id |
+---+----+---------+
|foo | bar|  foo_bar|
|null|bar |    null |
|baz |null|    null |
|null|null|    null |
+---+----+---------+

我尝试了这个方法，只得到了连接的列值

val listOfFixedColumns = List("A", "B", ..) // dynamic list of columns names as strings
df.withColumn("unique_id", concat_ws("_", listOfFixedColumns.map(c => col(c)): _*))

但我不知道如何处理无效案例：

+---+----+---------+
|  a|   b|unique_id|
+---+----+---------+
|foo | bar|  foo_bar|
|null|bar |    bar  |<-- needs a fix
|baz |null|    baz  |<-- needs a fix
|null|null|    null |
+---+----+---------+

+---+----+---------+
|a | b |唯一| id|
+---+----+---------+
|福吧|福吧|福吧||
|null | bar | bar |您可以使用Column类的方法，以及或运算符来确定何时存在null列。然后在以下条件下使用：
import org.apache.spark.sql.functions.{col，concat\ws，when}
val df=Seq(
（“foo”、“bar”、“foo_bar”），
（空，“条”，空），
（“baz”，空，空），
（空，空，空）
).toDF（“A”、“B”、“C”）
val LISTOFFEXEDCOLUMNS=列表（“A”、“B”、“C”）
val hasNull=listoffexedcolumns
.map（col（0）.isNull）
.减少（124; | 124;）
val concatNonEmpty=concat_ws（“，”列表），listoffexecedcolumns.map（col）：\u*）
df.withColumn（“唯一的_id”，当（！hasNull，concatNonEmpty）时）。否则（null））.show
// +----+----+-------+---------------+
//| A | B | C |唯一| id|
// +----+----+-------+---------------+
//|福|巴|福|巴|福|巴|福|巴|福|巴|
//|空|条|空|空|
//| baz | null | null | null|
//|空|空|空|空|
// +----+----+-------+---------------+
当然，在现实生活中，您可能会使用df.columns（可能会删除您不想包含的列）来编写更健壮的代码。我认为@kumar的案例需要将逻辑应用于列的子集，而不是所有列。