Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何替换DataFrame列中的空值?_Scala_Apache Spark_Spark Dataframe - Fatal编程技术网

Scala 如何替换DataFrame列中的空值?

Scala 如何替换DataFrame列中的空值?,scala,apache-spark,spark-dataframe,Scala,Apache Spark,Spark Dataframe,如何替换数据帧df的Field1列中的空值 Field1 Field2 AA 12 BB 此命令不提供预期结果: df.na.fill("Field1",Seq("Anonymous")) Field1 Field2 Anonymous AA 12 BB 预期结果: df.na.fill("Field1",Seq("Anonymous")) Field1 Field2 Anonymous

如何替换数据帧
df
Field1
列中的空值

Field1 Field2
       AA
12     BB
此命令不提供预期结果:

df.na.fill("Field1",Seq("Anonymous"))
Field1          Field2
Anonymous       AA
12              BB
预期结果:

df.na.fill("Field1",Seq("Anonymous"))
Field1          Field2
Anonymous       AA
12              BB
Fill:返回一个新的数据帧,该数据帧替换中的null或NaN值 具有值的数字列

两件事:

  • 空字符串不是null或NaN,因此必须使用case语句
  • 在数字列中输入文本值时,填充似乎不起作用
  • 用填充/文本替换空值失败:

    scala> a.show
    +----+---+
    |  f1| f2|
    +----+---+
    |null| AA|
    |  12| BB|
    +----+---+
    
    scala> a.na.fill("Anonymous", Seq("f1")).show
    +----+---+
    |  f1| f2|
    +----+---+
    |null| AA|
    |  12| BB|
    +----+---+
    
    scala> a.show
    +----+---+
    |  f1| f2|
    +----+---+
    |null| AA|
    |  12| BB|
    +----+---+
    
    
    scala> a.na.fill(1, Seq("f1")).show
    +---+---+
    | f1| f2|
    +---+---+
    |  1| AA|
    | 12| BB|
    +---+---+
    
    scala> b.show
    +---+---+
    | f1| f2|
    +---+---+
    |   | AA|
    | 12| BB|
    +---+---+
    
    
    scala> b.na.fill(1, Seq("f1")).show
    +---+---+
    | f1| f2|
    +---+---+
    |   | AA|
    | 12| BB|
    +---+---+
    
    scala> b.show
    +---+---+
    | f1| f2|
    +---+---+
    |   | AA|
    | 12| BB|
    +---+---+
    
    
    scala> b.select(when(col("f1") === "", "Anonymous").otherwise(col("f1")).as("f1"), col("f2")).show
    +---------+---+
    |       f1| f2|
    +---------+---+
    |Anonymous| AA|
    |       12| BB|
    +---------+---+
    
    工作示例-对所有数字使用Null:

    scala> a.show
    +----+---+
    |  f1| f2|
    +----+---+
    |null| AA|
    |  12| BB|
    +----+---+
    
    scala> a.na.fill("Anonymous", Seq("f1")).show
    +----+---+
    |  f1| f2|
    +----+---+
    |null| AA|
    |  12| BB|
    +----+---+
    
    scala> a.show
    +----+---+
    |  f1| f2|
    +----+---+
    |null| AA|
    |  12| BB|
    +----+---+
    
    
    scala> a.na.fill(1, Seq("f1")).show
    +---+---+
    | f1| f2|
    +---+---+
    |  1| AA|
    | 12| BB|
    +---+---+
    
    scala> b.show
    +---+---+
    | f1| f2|
    +---+---+
    |   | AA|
    | 12| BB|
    +---+---+
    
    
    scala> b.na.fill(1, Seq("f1")).show
    +---+---+
    | f1| f2|
    +---+---+
    |   | AA|
    | 12| BB|
    +---+---+
    
    scala> b.show
    +---+---+
    | f1| f2|
    +---+---+
    |   | AA|
    | 12| BB|
    +---+---+
    
    
    scala> b.select(when(col("f1") === "", "Anonymous").otherwise(col("f1")).as("f1"), col("f2")).show
    +---------+---+
    |       f1| f2|
    +---------+---+
    |Anonymous| AA|
    |       12| BB|
    +---------+---+
    
    失败示例(空字符串而不是Null):

    scala> a.show
    +----+---+
    |  f1| f2|
    +----+---+
    |null| AA|
    |  12| BB|
    +----+---+
    
    scala> a.na.fill("Anonymous", Seq("f1")).show
    +----+---+
    |  f1| f2|
    +----+---+
    |null| AA|
    |  12| BB|
    +----+---+
    
    scala> a.show
    +----+---+
    |  f1| f2|
    +----+---+
    |null| AA|
    |  12| BB|
    +----+---+
    
    
    scala> a.na.fill(1, Seq("f1")).show
    +---+---+
    | f1| f2|
    +---+---+
    |  1| AA|
    | 12| BB|
    +---+---+
    
    scala> b.show
    +---+---+
    | f1| f2|
    +---+---+
    |   | AA|
    | 12| BB|
    +---+---+
    
    
    scala> b.na.fill(1, Seq("f1")).show
    +---+---+
    | f1| f2|
    +---+---+
    |   | AA|
    | 12| BB|
    +---+---+
    
    scala> b.show
    +---+---+
    | f1| f2|
    +---+---+
    |   | AA|
    | 12| BB|
    +---+---+
    
    
    scala> b.select(when(col("f1") === "", "Anonymous").otherwise(col("f1")).as("f1"), col("f2")).show
    +---------+---+
    |       f1| f2|
    +---------+---+
    |Anonymous| AA|
    |       12| BB|
    +---------+---+
    
    案例语句修复示例:

    scala> a.show
    +----+---+
    |  f1| f2|
    +----+---+
    |null| AA|
    |  12| BB|
    +----+---+
    
    scala> a.na.fill("Anonymous", Seq("f1")).show
    +----+---+
    |  f1| f2|
    +----+---+
    |null| AA|
    |  12| BB|
    +----+---+
    
    scala> a.show
    +----+---+
    |  f1| f2|
    +----+---+
    |null| AA|
    |  12| BB|
    +----+---+
    
    
    scala> a.na.fill(1, Seq("f1")).show
    +---+---+
    | f1| f2|
    +---+---+
    |  1| AA|
    | 12| BB|
    +---+---+
    
    scala> b.show
    +---+---+
    | f1| f2|
    +---+---+
    |   | AA|
    | 12| BB|
    +---+---+
    
    
    scala> b.na.fill(1, Seq("f1")).show
    +---+---+
    | f1| f2|
    +---+---+
    |   | AA|
    | 12| BB|
    +---+---+
    
    scala> b.show
    +---+---+
    | f1| f2|
    +---+---+
    |   | AA|
    | 12| BB|
    +---+---+
    
    
    scala> b.select(when(col("f1") === "", "Anonymous").otherwise(col("f1")).as("f1"), col("f2")).show
    +---------+---+
    |       f1| f2|
    +---------+---+
    |Anonymous| AA|
    |       12| BB|
    +---------+---+
    
    你也可以试试这个。 这可能同时处理空/空/空

    df.show()
    +------+------+
    |Field1|Field2|
    +------+------+
    |      |    AA|
    |    12|    BB|
    |    12|  null|
    +------+------+
    
    df.na.replace(Seq("Field1","Field2"),Map(""-> null)).na.fill("Anonymous", Seq("Field2","Field1")).show(false)   
    
    +---------+---------+
    |Field1   |Field2   |
    +---------+---------+
    |Anonymous|AA       |
    |12       |BB       |
    |12       |Anonymous|
    +---------+---------+   
    

    当dataframe中有n个列时,可以尝试使用下面的代码

    注意:当您试图将数据写入拼花地板等格式时,不支持空数据类型。我们必须用打字机打它

    val df = Seq(
    (1, ""),
    (2, "Ram"),
    (3, "Sam"),
    (4,"")
    ).toDF("ID", "Name")
    
    // null type column
    
    val inputDf = df.withColumn("NulType", lit(null).cast(StringType))
    
    //Output
    
    +---+----+-------+
    | ID|Name|NulType|
    +---+----+-------+
    |  1|    |   null|
    |  2| Ram|   null|
    |  3| Sam|   null|
    |  4|    |   null|
    +---+----+-------+
    
    //Replace all blank space in the dataframe with null
    
    val colName = inputDf.columns //*This will give you array of string*
    
    val data = inputDf.na.replace(colName,Map(""->"null"))
    
    data.show()
    +---+----+-------+
    | ID|Name|NulType|
    +---+----+-------+
    |  1|null|   null|
    |  2| Ram|   null|
    |  3| Sam|   null|
    |  4|null|   null|
    +---+----+-------+
    

    请添加一些细节:你期望得到什么样的结果,你得到了什么?