Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 在dataframe spark中将嵌套的空值转换为空字符串_Scala_Apache Spark_Null_Apache Spark Sql_User Defined Functions - Fatal编程技术网

Scala 在dataframe spark中将嵌套的空值转换为空字符串

Scala 在dataframe spark中将嵌套的空值转换为空字符串,scala,apache-spark,null,apache-spark-sql,user-defined-functions,Scala,Apache Spark,Null,Apache Spark Sql,User Defined Functions,我希望将嵌套在字符串数组中的空值转换为spark中的空字符串。数据位于数据帧中。我计划在确保数据帧空安全后运行reduce函数,不确定这是否有助于回答这个问题。我使用的是spark 1.6 模式: root |-- carLineName: array (nullable = true) | |-- element: string (containsNull = true) 输入示例: +--------------------+ |carLineName | +----

我希望将嵌套在字符串数组中的空值转换为spark中的空字符串。数据位于数据帧中。我计划在确保数据帧空安全后运行reduce函数,不确定这是否有助于回答这个问题。我使用的是spark 1.6

模式:

root
|-- carLineName: array (nullable = true)
|    |-- element: string (containsNull = true)
输入示例:

+--------------------+
|carLineName         |
+--------------------+
|[null,null,null]    |
|[null, null]        |
|[Mustang, null]     |
|[Pilot, Jeep]       |
期望输出:

+--------------------+
|carLineName         |
+--------------------+
|[,,]                | 
|[,]                 |
|[Mustang,]          |
|[Pilot, Jeep]       |
我的尝试:

val safeString: Seq[String] => Seq[String] = s => if (s == null) "" else s
val udfSafeString = udf(safeString)

UDF
的输入是一个字符串序列,而不是单个字符串。既然是这样,您需要
映射它。您可以按如下方式执行此操作:

val udfSafeString = udf((arr: Seq[String]) => {
  arr.map(s => if (s == null) "" else s)
})

df.withColumn("carLineName", udfSafeString($"carLineName"))