Scala 如果有多个空格,则使用regexp\u replace将空格替换为空值
如果多列中有空格,如何将空格替换为NullScala 如果有多个空格,则使用regexp\u replace将空格替换为空值,scala,Scala,如果多列中有空格,如何将空格替换为Null Input Dataset which i have +---+-----++----+ | Id|col_1|col_2| +---+-----+-----+ | 0|104 | | | 1| | | +---+-----+-----+ 输出数据集: +---+-----++----+ | Id|col_1|col_2| +---+-----+-----+ | 0|104 | Null| | 1|Null | N
Input Dataset which i have
+---+-----++----+
| Id|col_1|col_2|
+---+-----+-----+
| 0|104 | |
| 1| | |
+---+-----+-----+
输出数据集:
+---+-----++----+
| Id|col_1|col_2|
+---+-----+-----+
| 0|104 | Null|
| 1|Null | Null|
+---+-----+-----+
每列使用一个withColumn:
import org.apache.spark.sql.functions._
val df = List(("0", "104", " "), ("1", " ", "")).toDF("Id","col_1", "col_2")
val test = df
.withColumn("col_1", when(regexp_replace (col("col_1"), "\\s+", "") === "", null).otherwise(col("col_1")))
.withColumn("col_2", when(regexp_replace (col("col_2"), "\\s+", "") === "", null).otherwise(col("col_2")))
.show
结果
+---+-----+-----+
| Id|col_1|col_2|
+---+-----+-----+
| 0| 104| null|
| 1| null| null|
+---+-----+-----+
每列使用一个withColumn:
import org.apache.spark.sql.functions._
val df = List(("0", "104", " "), ("1", " ", "")).toDF("Id","col_1", "col_2")
val test = df
.withColumn("col_1", when(regexp_replace (col("col_1"), "\\s+", "") === "", null).otherwise(col("col_1")))
.withColumn("col_2", when(regexp_replace (col("col_2"), "\\s+", "") === "", null).otherwise(col("col_2")))
.show
结果
+---+-----+-----+
| Id|col_1|col_2|
+---+-----+-----+
| 0| 104| null|
| 1| null| null|
+---+-----+-----+
嗨,你可以这样做:
scala> val someDFWithName = Seq((1, "anurag", ""), (5, "", "")).toDF("id", "name", "age")
someDFWithName: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]
scala> someDFWithName.show
+---+------+---+
| id| name|age|
+---+------+---+
| 1|anurag| |
| 5| | |
+---+------+---+
scala> someDFWithName.na.replace(Seq("name","age"),Map(""-> null)).show
+---+------+----+
| id| name| age|
+---+------+----+
| 1|anurag|null|
| 5| null|null|
+---+------+----+
或者也可以尝试一下:
scala> someDFWithName.withColumn("Name", when(col("Name") === "", null).otherwise(col("Name"))).withColumn("Age", when(col("Age") === "", null).otherwise(col("Age"))).show
+---+------+----+
| id| name| age|
+---+------+----+
| 1|anurag|null|
| 5| null|null|
+---+------+----+
或者,对于多个空间,请尝试以下方法:
scala> val someDFWithName = Seq(("n", "a"), ( "", "n"), (" ", ""), (" ", "a"), (" ",""), (" "," "), ("c"," ")).toDF("name", "place")
someDFWithName: org.apache.spark.sql.DataFrame = [name: string, place: string]
scala> someDFWithName.withColumn("Name", when(regexp_replace(col("name"),"\\s+","") === "", null).otherwise(col("Name"))).withColumn("Place", when(regexp_replace(col("place"),"\\s+","") === "", null).otherwise(col("place"))).show
+----+-----+
|Name|Place|
+----+-----+
| n| a|
|null| n|
|null| null|
|null| a|
|null| null|
|null| null|
| c| null|
+----+-----+
我希望这对你有帮助。谢谢您好,您可以这样做:
scala> val someDFWithName = Seq((1, "anurag", ""), (5, "", "")).toDF("id", "name", "age")
someDFWithName: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]
scala> someDFWithName.show
+---+------+---+
| id| name|age|
+---+------+---+
| 1|anurag| |
| 5| | |
+---+------+---+
scala> someDFWithName.na.replace(Seq("name","age"),Map(""-> null)).show
+---+------+----+
| id| name| age|
+---+------+----+
| 1|anurag|null|
| 5| null|null|
+---+------+----+
或者也可以尝试一下:
scala> someDFWithName.withColumn("Name", when(col("Name") === "", null).otherwise(col("Name"))).withColumn("Age", when(col("Age") === "", null).otherwise(col("Age"))).show
+---+------+----+
| id| name| age|
+---+------+----+
| 1|anurag|null|
| 5| null|null|
+---+------+----+
或者,对于多个空间,请尝试以下方法:
scala> val someDFWithName = Seq(("n", "a"), ( "", "n"), (" ", ""), (" ", "a"), (" ",""), (" "," "), ("c"," ")).toDF("name", "place")
someDFWithName: org.apache.spark.sql.DataFrame = [name: string, place: string]
scala> someDFWithName.withColumn("Name", when(regexp_replace(col("name"),"\\s+","") === "", null).otherwise(col("Name"))).withColumn("Place", when(regexp_replace(col("place"),"\\s+","") === "", null).otherwise(col("place"))).show
+----+-----+
|Name|Place|
+----+-----+
| n| a|
|null| n|
|null| null|
|null| a|
|null| null|
|null| null|
| c| null|
+----+-----+
我希望这对你有帮助。谢谢我在databricks中试用过,但不起作用。请参阅下面的代码和各自的输出。odr.na.replace(序号(“姓名”、“年龄”),映射(“->null)).show+----+----+----姓名地点+----+----姓名a a a a a a a a a a a a a a a a a a a a a a a | | | | | | | | | | | | | b | | | | c | | | | | | | | | | | | | | | | | | | | | 124
odr.na.replace(Seq(“Name”、“Place”)、Map(“->null”)。show
请根据需要对查询进行更改。不要盲目复制粘贴。不起作用。我按照你的指示试过了。有什么我遗漏的吗?你能提供你的所有查询,比如你是如何创建DF的吗?你想过滤掉什么?@Learner我有excel文件,我正在从DBFS导入该文件并创建数据框架。所以在我的文件中,有很多列都有空格,但是我想删除所有空的空格。我已经在databricks中尝试过了,但是它不起作用。请参阅下面的代码和各自的输出。odr.na.replace(序号(“姓名”、“年龄”),映射(“->null)).show+----+----+----姓名地点+----+----姓名a a a a a a a a a a a a a a a a a a a a a a a | | | | | | | | | | | | | b | | | | c | | | | | | | | | | | | | | | | | | | | | 124odr.na.replace(Seq(“Name”、“Place”)、Map(“->null”)。show
请根据需要对查询进行更改。不要盲目复制粘贴。不起作用。我按照你的指示试过了。有什么我遗漏的吗?你能提供你的所有查询,比如你是如何创建DF的吗?你想过滤掉什么?@Learner我有excel文件,我正在从DBFS导入该文件并创建数据框架。因此,在我的文件中,有很多列都有空格,但我想删除所有“null”的空格。笔记本:2:警告:通过插入()来调整参数列表已被弃用:leaky(Object receiving)target使此操作特别危险。签名:Column.getItem(键:Any):org.apache.spark.sql.Column给定参数:改编后:Column.getItem(():Unit).withColumn(“名称”),if(col(“名称”).getItem().toString().replaceAll(“,”).equals(“”)lit(null)else col(“名称”))我收到以下错误:java.lang.RuntimeException:不受支持的文本类型类scala.runtime.BoxedUnit()@praveensasini现在已修复笔记本:2:警告:通过插入()对参数列表进行自适应已被弃用:泄漏(对象接收)目标使此操作特别危险。签名:Column.getItem(键:Any):org.apache.spark.sql.Column给定参数:改编后:Column.getItem(():Unit).withColumn(“名称”),if(col(“名称”).getItem().toString().replaceAll(“,”).equals(“”)lit(null)else col(“名称”))我收到以下错误java.lang.RuntimeException:不支持的文本类型类scala.runtime.BoxedUnit()@praveensasini现在已修复