Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 在Spark中拆分列并将空值转换为null_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

Scala 在Spark中拆分列并将空值转换为null

Scala 在Spark中拆分列并将空值转换为null,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,在Spark中拆分列时,我试图将空值填充为null。例如: | A | | 1.2.3 | | 4..5 | 我在寻找: A. 一分为二 分成两半 分成三份 1.2.3 1. 2. 3. 4..5 4. 无效的 5. 通过将空值替换为null,可以转换分割结果: val result = df.withColumn( "split", expr("transform(split(A, '\\\\.'), x ->

在Spark中拆分列时,我试图将空值填充为null。例如:

| A        |
| 1.2.3    |
| 4..5     |
我在寻找:

A. 一分为二 分成两半 分成三份 1.2.3 1. 2. 3. 4..5 4. 无效的 5.
通过将空值替换为null,可以
转换
分割结果:

val result = df.withColumn(
    "split",
    expr("transform(split(A, '\\\\.'), x -> case when x = '' then null else x end)")
).select($"A", $"split"(0), $"split"(1), $"split"(2))

result.show
+-----+--------+--------+--------+
|    A|split[0]|split[1]|split[2]|
+-----+--------+--------+--------+
|1.2.3|       1|       2|       3|
| 4..5|       4|    null|       5|
+-----+--------+--------+--------+

然后,当获取数组项作为列时,可以拆分。如果元素为空,则使用
when
将其更改为null:

// n is the max array size from split (in your example it's 3)
val n = 3

val df1 = df.withColumn(
    "ASplit",
    split(col("A"), "[.]")
  ).select(
    Seq(col("A")) ++ (0 to n-1).map(i =>
      when(col("ASplit")(i) === "", lit(null)).otherwise(col("ASplit")(i)).as(s"A split $i")
    ): _*
  )
    
//+-----+---------+---------+---------+
//|    A|A split 0|A split 1|A split 2|
//+-----+---------+---------+---------+
//|1.2.3|        1|        2|        3|
//| 4..5|        4|     null|        5|
//+-----+---------+---------+---------+

假设拆分部分已解析,是否要从数组创建新列?或者只想在拆分数组中用null替换空字符串。