Scala 如果列中的行为null，则获取列名和另一列值_Scala_Apache Spark

Scala 如果列中的行为null，则获取列名和另一列值

scala apache-spark

Scala 如果列中的行为null，则获取列名和另一列值,scala,apache-spark,Scala,Apache Spark,我有一个带id的spark数据框和几个列。Id不能为null，但其他列中可能包含null。输入数据帧为 |A| |id| |b| |1| |1| |2| |null| |2| |3| |null| |3| |null| 我想捕获所有具有null和相应id的列，并在另一列的下面追加一列。预期产量 |colName| |id| |A| |2| |A| |3| |B| |3| 提前谢谢。请尽量避免手动循环。我尝试了以下方法，

我有一个带id的spark数据框和几个列。Id不能为null，但其他列中可能包含null。输入数据帧为

|A|    |id| |b|
|1|     |1|     |2|
|null|  |2|     |3|
|null|  |3|     |null|

我想捕获所有具有null和相应id的列，并在另一列的下面追加一列。预期产量

|colName|   |id|
|A|     |2| 
|A|     |3|
|B|     |3|

提前谢谢。请尽量避免手动循环。

我尝试了以下方法，如果需要任何更改，请提出建议

import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
//sample RDD
val rdd=spark.sparkContext.parallelize(Seq(Row(1,1,2),Row(2,null,3),Row(3,null,null)))
//schema
val schema=(new StructType).add("ID",IntegerType).add("A",IntegerType).add("B",IntegerType)
//creating dataframe
var df=spark.createDataFrame(rdd,schema)
df.show
+---+----+----+
| ID|   A|   B|
+---+----+----+
|  1|   1|   2|
|  2|null|   3|
|  3|null|null|
+---+----+----+

//get all the columns except ID column
val columnsExceptID=df.columns.filter(_!="ID")
//fill the corresponding column name in the place of null
df=columnsExceptID.foldLeft(df){(df,column)=>df.withColumn(column,when(col(column).isNull,column).otherwise(""))}

//array + explode ---> to get required output pattern of DF      
df=df.withColumn("colName",array(columnsExceptID.map(col(_)):_*)).drop(columnsExceptID:_*)

df=df.select('ID,explode('colName)).where(length('col)>0)
df.show
+---+---+
| ID|col|
+---+---+
|  2|  A|
|  3|  A|
|  3|  B|
+---+---+

我尝试了下面的方法，如果需要任何更改，请提出建议

import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
//sample RDD
val rdd=spark.sparkContext.parallelize(Seq(Row(1,1,2),Row(2,null,3),Row(3,null,null)))
//schema
val schema=(new StructType).add("ID",IntegerType).add("A",IntegerType).add("B",IntegerType)
//creating dataframe
var df=spark.createDataFrame(rdd,schema)
df.show
+---+----+----+
| ID|   A|   B|
+---+----+----+
|  1|   1|   2|
|  2|null|   3|
|  3|null|null|
+---+----+----+

//get all the columns except ID column
val columnsExceptID=df.columns.filter(_!="ID")
//fill the corresponding column name in the place of null
df=columnsExceptID.foldLeft(df){(df,column)=>df.withColumn(column,when(col(column).isNull,column).otherwise(""))}

//array + explode ---> to get required output pattern of DF      
df=df.withColumn("colName",array(columnsExceptID.map(col(_)):_*)).drop(columnsExceptID:_*)

df=df.select('ID,explode('colName)).where(length('col)>0)
df.show
+---+---+
| ID|col|
+---+---+
|  2|  A|
|  3|  A|
|  3|  B|
+---+---+

不要只做尾巴，明确地过滤掉它。例如，df.columns.filter（！=“ID”）不只是做tail，而是显式地过滤掉它。即df.columns.filter（！=“ID”）