Scala 如果有多个空格，则使用regexp\u replace将空格替换为空值_Scala

Scala 如果有多个空格，则使用regexp\u replace将空格替换为空值

scala

Scala 如果有多个空格，则使用regexp\u replace将空格替换为空值,scala,Scala,如果多列中有空格，如何将空格替换为Null Input Dataset which i have +---+-----++----+ | Id|col_1|col_2| +---+-----+-----+ | 0|104 | | | 1| | | +---+-----+-----+ 输出数据集： +---+-----++----+ | Id|col_1|col_2| +---+-----+-----+ | 0|104 | Null| | 1|Null | N

如果多列中有空格，如何将空格替换为Null

Input Dataset which i have
+---+-----++----+
| Id|col_1|col_2|
+---+-----+-----+
|  0|104  |     |
|  1|     |     |
+---+-----+-----+

输出数据集：

+---+-----++----+
| Id|col_1|col_2|
+---+-----+-----+
|  0|104  | Null|
|  1|Null | Null|
+---+-----+-----+

每列使用一个withColumn：

import org.apache.spark.sql.functions._
val df = List(("0", "104", "    "), ("1", " ", "")).toDF("Id","col_1", "col_2")

val test = df
  .withColumn("col_1", when(regexp_replace (col("col_1"), "\\s+", "") === "", null).otherwise(col("col_1")))
  .withColumn("col_2", when(regexp_replace (col("col_2"), "\\s+", "") === "", null).otherwise(col("col_2")))
  .show

结果

+---+-----+-----+
| Id|col_1|col_2|
+---+-----+-----+
|  0|  104| null|
|  1| null| null|
+---+-----+-----+

每列使用一个withColumn：

import org.apache.spark.sql.functions._
val df = List(("0", "104", "    "), ("1", " ", "")).toDF("Id","col_1", "col_2")

val test = df
  .withColumn("col_1", when(regexp_replace (col("col_1"), "\\s+", "") === "", null).otherwise(col("col_1")))
  .withColumn("col_2", when(regexp_replace (col("col_2"), "\\s+", "") === "", null).otherwise(col("col_2")))
  .show

结果

+---+-----+-----+
| Id|col_1|col_2|
+---+-----+-----+
|  0|  104| null|
|  1| null| null|
+---+-----+-----+

嗨，你可以这样做：

scala> val someDFWithName = Seq((1, "anurag", ""), (5, "", "")).toDF("id", "name", "age")
someDFWithName: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]

scala> someDFWithName.show
+---+------+---+
| id|  name|age|
+---+------+---+
|  1|anurag|   |
|  5|      |   |
+---+------+---+
scala> someDFWithName.na.replace(Seq("name","age"),Map(""-> null)).show
+---+------+----+
| id|  name| age|
+---+------+----+
|  1|anurag|null|
|  5|  null|null|
+---+------+----+

或者也可以尝试一下：

scala> someDFWithName.withColumn("Name", when(col("Name") === "", null).otherwise(col("Name"))).withColumn("Age", when(col("Age") === "", null).otherwise(col("Age"))).show
+---+------+----+
| id|  name| age|
+---+------+----+
|  1|anurag|null|
|  5|  null|null|
+---+------+----+

或者，对于多个空间，请尝试以下方法：

scala> val someDFWithName = Seq(("n", "a"), ( "", "n"), ("         ", ""), ("  ", "a"), ("   ",""), ("        ","   "), ("c"," ")).toDF("name", "place")
someDFWithName: org.apache.spark.sql.DataFrame = [name: string, place: string]

scala> someDFWithName.withColumn("Name", when(regexp_replace(col("name"),"\\s+","") === "", null).otherwise(col("Name"))).withColumn("Place", when(regexp_replace(col("place"),"\\s+","") === "", null).otherwise(col("place"))).show
+----+-----+
|Name|Place|
+----+-----+
|   n|    a|
|null|    n|
|null| null|
|null|    a|
|null| null|
|null| null|
|   c| null|
+----+-----+

我希望这对你有帮助。谢谢

您好，您可以这样做：

scala> val someDFWithName = Seq((1, "anurag", ""), (5, "", "")).toDF("id", "name", "age")
someDFWithName: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]

scala> someDFWithName.show
+---+------+---+
| id|  name|age|
+---+------+---+
|  1|anurag|   |
|  5|      |   |
+---+------+---+
scala> someDFWithName.na.replace(Seq("name","age"),Map(""-> null)).show
+---+------+----+
| id|  name| age|
+---+------+----+
|  1|anurag|null|
|  5|  null|null|
+---+------+----+

或者也可以尝试一下：

scala> someDFWithName.withColumn("Name", when(col("Name") === "", null).otherwise(col("Name"))).withColumn("Age", when(col("Age") === "", null).otherwise(col("Age"))).show
+---+------+----+
| id|  name| age|
+---+------+----+
|  1|anurag|null|
|  5|  null|null|
+---+------+----+

或者，对于多个空间，请尝试以下方法：

scala> val someDFWithName = Seq(("n", "a"), ( "", "n"), ("         ", ""), ("  ", "a"), ("   ",""), ("        ","   "), ("c"," ")).toDF("name", "place")
someDFWithName: org.apache.spark.sql.DataFrame = [name: string, place: string]

scala> someDFWithName.withColumn("Name", when(regexp_replace(col("name"),"\\s+","") === "", null).otherwise(col("Name"))).withColumn("Place", when(regexp_replace(col("place"),"\\s+","") === "", null).otherwise(col("place"))).show
+----+-----+
|Name|Place|
+----+-----+
|   n|    a|
|null|    n|
|null| null|
|null|    a|
|null| null|
|null| null|
|   c| null|
+----+-----+

我希望这对你有帮助。谢谢

我在databricks中试用过，但不起作用。请参阅下面的代码和各自的输出。odr.na.replace（序号（“姓名”、“年龄”），映射（“->null））.show+----+----+----姓名地点+----+----姓名a a a a a a a a a a a a a a a a a a a a a a a | | | | | | | | | | | | | b | | | | c | | | | | | | | | | | | | | | | | | | | | 124

odr.na.replace（Seq（“Name”、“Place”）、Map（“->null”）。show

请根据需要对查询进行更改。不要盲目复制粘贴。不起作用。我按照你的指示试过了。有什么我遗漏的吗？你能提供你的所有查询，比如你是如何创建DF的吗？你想过滤掉什么？@Learner我有excel文件，我正在从DBFS导入该文件并创建数据框架。所以在我的文件中，有很多列都有空格，但是我想删除所有空的空格。我已经在databricks中尝试过了，但是它不起作用。请参阅下面的代码和各自的输出。odr.na.replace（序号（“姓名”、“年龄”），映射（“->null））.show+----+----+----姓名地点+----+----姓名a a a a a a a a a a a a a a a a a a a a a a a | | | | | | | | | | | | | b | | | | c | | | | | | | | | | | | | | | | | | | | | 124

odr.na.replace（Seq（“Name”、“Place”）、Map（“->null”）。show

请根据需要对查询进行更改。不要盲目复制粘贴。不起作用。我按照你的指示试过了。有什么我遗漏的吗？你能提供你的所有查询，比如你是如何创建DF的吗？你想过滤掉什么？@Learner我有excel文件，我正在从DBFS导入该文件并创建数据框架。因此，在我的文件中，有很多列都有空格，但我想删除所有“null”的空格。笔记本：2：警告：通过插入（）来调整参数列表已被弃用：leaky（Object receiving）target使此操作特别危险。签名：Column.getItem（键：Any）：org.apache.spark.sql.Column给定参数：改编后：Column.getItem（（）：Unit）.withColumn（“名称”），if（col（“名称”）.getItem（）.toString（）.replaceAll（“，”）.equals（“”）lit（null）else col（“名称”））我收到以下错误：java.lang.RuntimeException:不受支持的文本类型类scala.runtime.BoxedUnit（）@praveensasini现在已修复笔记本：2:警告：通过插入（）对参数列表进行自适应已被弃用：泄漏（对象接收）目标使此操作特别危险。签名：Column.getItem（键：Any）：org.apache.spark.sql.Column给定参数：改编后：Column.getItem（（）：Unit）.withColumn（“名称”），if（col（“名称”）.getItem（）.toString（）.replaceAll（“，”）.equals（“”）lit（null）else col（“名称”））我收到以下错误java.lang.RuntimeException:不支持的文本类型类scala.runtime.BoxedUnit（）@praveensasini现在已修复