当结构中的所有值都为空时,如何在Scala spark中为结构设置空值?
我有一个spark scala数据框,其中一列是结构,当结构中的所有值都为null时,我希望为null而不是对象当结构中的所有值都为空时,如何在Scala spark中为结构设置空值?,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我有一个spark scala数据框,其中一列是结构,当结构中的所有值都为null时,我希望为null而不是对象 val someDF = Seq( (8, null,null), (64, "mouse", "s"), (-27, "horse", "e") ).toDF("a", "b", "c") def make_week_struct
val someDF = Seq(
(8, null,null),
(64, "mouse", "s"),
(-27, "horse", "e")
).toDF("a", "b", "c")
def make_week_struct (week:String) : Column = {
val summary = struct($"b", $"c").alias(s"wks_${week}_jrny")
return summary
}
val week1_summary = make_week_struct("1")
var dd = someDF.select($"a",week1_summary)
display(dd)
样本数据
a b c
8 null null
64 mouse s
-27 horse e
电流输出
a wks_1_jrny
8 object:{a:null, b:null}
64 object:{a:"mouse", b:"s"}
-27 object:{a:"horse", b:"e"}
预期产量
a wks_1_jrny
8 null
64 object:{a:"mouse", b:"s"}
-27 object:{a:"horse", b:"e"}
您还可以使用
来_json
函数并过滤空的json{}
scala>
dd
.withColumn("wks_1_jrny",
when(
to_json($"wks_1_jrny") =!= "{}", // Filter Empty Json values.
$"wks_1_jrny"
)
)
.show(false)
+---+----------+
|a |wks_1_jrny|
+---+----------+
|8 |null |
|64 |[mouse,s] |
|-27|[horse,e] |
+---+----------+
这也应该起作用:
import org.apache.spark.sql.functions._
import spark.implicits._
val df = List(
(None, None),
(None, Some("abc")),
(Some(1), Some("xyz"))
).toDF("id", "name")
val structCols = Seq("id", "name")
val dataStruct = struct(structCols.map(col): _*)
val emptyStruct = struct(df.schema.fields.filter(f => structCols.contains(f.name)).map(f => lit(null).cast(f.dataType).as(f.name)):_*)
df
.select(when(dataStruct.equalTo(emptyStruct), lit(null: StructType)).otherwise(dataStruct).as("col"))
.show(false)
能否添加一些示例数据和预期输出???@Srinivas added!!也尝试过简化,谢谢,实际数据集中的struct元素是相同的数据类型吗?在我的示例中,它们是相同的,但我想要一个类型无关的解决方案@LeoC@SalsaSteve将此问题标记为重复和违反行为