Scala 如果有多个空格,则使用regexp\u replace将空格替换为空值

Scala 如果有多个空格,则使用regexp\u replace将空格替换为空值,scala,Scala,如果多列中有空格,如何将空格替换为Null Input Dataset which i have +---+-----++----+ | Id|col_1|col_2| +---+-----+-----+ | 0|104 | | | 1| | | +---+-----+-----+ 输出数据集: +---+-----++----+ | Id|col_1|col_2| +---+-----+-----+ | 0|104 | Null| | 1|Null | N

如果多列中有空格,如何将空格替换为Null

Input Dataset which i have
+---+-----++----+
| Id|col_1|col_2|
+---+-----+-----+
|  0|104  |     |
|  1|     |     |
+---+-----+-----+
输出数据集:

+---+-----++----+
| Id|col_1|col_2|
+---+-----+-----+
|  0|104  | Null|
|  1|Null | Null|
+---+-----+-----+

每列使用一个withColumn:

import org.apache.spark.sql.functions._
val df = List(("0", "104", "    "), ("1", " ", "")).toDF("Id","col_1", "col_2")

val test = df
  .withColumn("col_1", when(regexp_replace (col("col_1"), "\\s+", "") === "", null).otherwise(col("col_1")))
  .withColumn("col_2", when(regexp_replace (col("col_2"), "\\s+", "") === "", null).otherwise(col("col_2")))
  .show
结果

+---+-----+-----+
| Id|col_1|col_2|
+---+-----+-----+
|  0|  104| null|
|  1| null| null|
+---+-----+-----+

每列使用一个withColumn:

import org.apache.spark.sql.functions._
val df = List(("0", "104", "    "), ("1", " ", "")).toDF("Id","col_1", "col_2")

val test = df
  .withColumn("col_1", when(regexp_replace (col("col_1"), "\\s+", "") === "", null).otherwise(col("col_1")))
  .withColumn("col_2", when(regexp_replace (col("col_2"), "\\s+", "") === "", null).otherwise(col("col_2")))
  .show
结果

+---+-----+-----+
| Id|col_1|col_2|
+---+-----+-----+
|  0|  104| null|
|  1| null| null|
+---+-----+-----+

嗨,你可以这样做:

scala> val someDFWithName = Seq((1, "anurag", ""), (5, "", "")).toDF("id", "name", "age")
someDFWithName: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]

scala> someDFWithName.show
+---+------+---+
| id|  name|age|
+---+------+---+
|  1|anurag|   |
|  5|      |   |
+---+------+---+
scala> someDFWithName.na.replace(Seq("name","age"),Map(""-> null)).show
+---+------+----+
| id|  name| age|
+---+------+----+
|  1|anurag|null|
|  5|  null|null|
+---+------+----+
或者也可以尝试一下:

scala> someDFWithName.withColumn("Name", when(col("Name") === "", null).otherwise(col("Name"))).withColumn("Age", when(col("Age") === "", null).otherwise(col("Age"))).show
+---+------+----+
| id|  name| age|
+---+------+----+
|  1|anurag|null|
|  5|  null|null|
+---+------+----+
或者,对于多个空间,请尝试以下方法:

scala> val someDFWithName = Seq(("n", "a"), ( "", "n"), ("         ", ""), ("  ", "a"), ("   ",""), ("        ","   "), ("c"," ")).toDF("name", "place")
someDFWithName: org.apache.spark.sql.DataFrame = [name: string, place: string]

scala> someDFWithName.withColumn("Name", when(regexp_replace(col("name"),"\\s+","") === "", null).otherwise(col("Name"))).withColumn("Place", when(regexp_replace(col("place"),"\\s+","") === "", null).otherwise(col("place"))).show
+----+-----+
|Name|Place|
+----+-----+
|   n|    a|
|null|    n|
|null| null|
|null|    a|
|null| null|
|null| null|
|   c| null|
+----+-----+

我希望这对你有帮助。谢谢

您好,您可以这样做:

scala> val someDFWithName = Seq((1, "anurag", ""), (5, "", "")).toDF("id", "name", "age")
someDFWithName: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]

scala> someDFWithName.show
+---+------+---+
| id|  name|age|
+---+------+---+
|  1|anurag|   |
|  5|      |   |
+---+------+---+
scala> someDFWithName.na.replace(Seq("name","age"),Map(""-> null)).show
+---+------+----+
| id|  name| age|
+---+------+----+
|  1|anurag|null|
|  5|  null|null|
+---+------+----+
或者也可以尝试一下:

scala> someDFWithName.withColumn("Name", when(col("Name") === "", null).otherwise(col("Name"))).withColumn("Age", when(col("Age") === "", null).otherwise(col("Age"))).show
+---+------+----+
| id|  name| age|
+---+------+----+
|  1|anurag|null|
|  5|  null|null|
+---+------+----+
或者,对于多个空间,请尝试以下方法:

scala> val someDFWithName = Seq(("n", "a"), ( "", "n"), ("         ", ""), ("  ", "a"), ("   ",""), ("        ","   "), ("c"," ")).toDF("name", "place")
someDFWithName: org.apache.spark.sql.DataFrame = [name: string, place: string]

scala> someDFWithName.withColumn("Name", when(regexp_replace(col("name"),"\\s+","") === "", null).otherwise(col("Name"))).withColumn("Place", when(regexp_replace(col("place"),"\\s+","") === "", null).otherwise(col("place"))).show
+----+-----+
|Name|Place|
+----+-----+
|   n|    a|
|null|    n|
|null| null|
|null|    a|
|null| null|
|null| null|
|   c| null|
+----+-----+

我希望这对你有帮助。谢谢

我在databricks中试用过,但不起作用。请参阅下面的代码和各自的输出。odr.na.replace(序号(“姓名”、“年龄”),映射(“->null)).show+----+----+----姓名地点+----+----姓名a a a a a a a a a a a a a a a a a a a a a a a | | | | | | | | | | | | | b | | | | c | | | | | | | | | | | | | | | | | | | | | 124
odr.na.replace(Seq(“Name”、“Place”)、Map(“->null”)。show
请根据需要对查询进行更改。不要盲目复制粘贴。不起作用。我按照你的指示试过了。有什么我遗漏的吗?你能提供你的所有查询,比如你是如何创建DF的吗?你想过滤掉什么?@Learner我有excel文件,我正在从DBFS导入该文件并创建数据框架。所以在我的文件中,有很多列都有空格,但是我想删除所有空的空格。我已经在databricks中尝试过了,但是它不起作用。请参阅下面的代码和各自的输出。odr.na.replace(序号(“姓名”、“年龄”),映射(“->null)).show+----+----+----姓名地点+----+----姓名a a a a a a a a a a a a a a a a a a a a a a a | | | | | | | | | | | | | b | | | | c | | | | | | | | | | | | | | | | | | | | | 124
odr.na.replace(Seq(“Name”、“Place”)、Map(“->null”)。show
请根据需要对查询进行更改。不要盲目复制粘贴。不起作用。我按照你的指示试过了。有什么我遗漏的吗?你能提供你的所有查询,比如你是如何创建DF的吗?你想过滤掉什么?@Learner我有excel文件,我正在从DBFS导入该文件并创建数据框架。因此,在我的文件中,有很多列都有空格,但我想删除所有“null”的空格。笔记本:2:警告:通过插入()来调整参数列表已被弃用:leaky(Object receiving)target使此操作特别危险。签名:Column.getItem(键:Any):org.apache.spark.sql.Column给定参数:改编后:Column.getItem(():Unit).withColumn(“名称”),if(col(“名称”).getItem().toString().replaceAll(“,”).equals(“”)lit(null)else col(“名称”))我收到以下错误:java.lang.RuntimeException:不受支持的文本类型类scala.runtime.BoxedUnit()@praveensasini现在已修复笔记本:2:警告:通过插入()对参数列表进行自适应已被弃用:泄漏(对象接收)目标使此操作特别危险。签名:Column.getItem(键:Any):org.apache.spark.sql.Column给定参数:改编后:Column.getItem(():Unit).withColumn(“名称”),if(col(“名称”).getItem().toString().replaceAll(“,”).equals(“”)lit(null)else col(“名称”))我收到以下错误java.lang.RuntimeException:不支持的文本类型类scala.runtime.BoxedUnit()@praveensasini现在已修复