Scala如何在使用sqlContext的查询中处理isnull或ifnull_Sql_Scala_Apache Spark_Isnull

Scala如何在使用sqlContext的查询中处理isnull或ifnull

sql scala apache-spark

Scala如何在使用sqlContext的查询中处理isnull或ifnull,sql,scala,apache-spark,isnull,Sql,Scala,Apache Spark,Isnull,我有两个数据文件，如下所示： course.txt id,course 1,Hadoop 2,Spark 3,HBase 5,Impala Fee.txt id,amount 2,3900 3,4200 4,2900 我需要列出所有课程信息及其费用： sqlContext.sql("select c.id, c.course, f.amount from course c left outer join fee f on f.id = c.id").show +---+------+

我有两个数据文件，如下所示：

course.txt 
id,course 
1,Hadoop
2,Spark
3,HBase
5,Impala

Fee.txt 
id,amount 
2,3900
3,4200
4,2900

我需要列出所有课程信息及其费用：

sqlContext.sql("select c.id, c.course, f.amount from course c left outer join fee f on f.id = c.id").show
+---+------+------+
| id|course|amount|
+---+------+------+
|  1|Hadoop|  null|
|  2| Spark|3900.0|
|  3| HBase|4200.0|
|  5|Impala|  null|
+---+------+------+

如果费用表中未显示课程，则我希望显示“N/A”，而不是显示null

我尝试了以下方法，但尚未获得：

命令1：

sqlContext.sql("select c.id, c.course, ifnull(f.amount, 'N/A') from course c left outer join fee f on f.id = c.id").show

错误：org.apache.spark.sql.AnalysisException:未定义函数ifnull；第1行位置40

命令2：

sqlContext.sql("select c.id, c.course, isnull(f.amount, 'N/A') from course c left outer join fee f on f.id = c.id").show

错误： org.apache.spark.sql.AnalysisException:配置单元udf类org.apache.hadoop.Hive.ql.udf.generic.GenericUDFOPNull没有处理程序，因为：运算符“IS NULL”只接受1个参数。。；第1行位置40

在Scala中，在sqlContext中处理此问题的正确方法是什么？非常感谢。

使用DataFrameNA函数。连接完成后，可以使用DataFrameNA fill函数将所有空值替换为字符串

使用Spark DataFrame API，您可以在以下条件下使用

when/other

：

val course = Seq(
  (1, "Hadoop"),
  (2, "Spark"),
  (3, "HBase"),
  (5, "Impala")
).toDF("id", "course")

val fee = Seq(
  (2, 3900),
  (3, 4200),
  (4, 2900)
).toDF("id", "amount")

course.join(fee, Seq("id"), "left_outer").
  withColumn("amount", when($"amount".isNull, "N/A").otherwise($"amount")).
  show
// +---+------+------+
// | id|course|amount|
// +---+------+------+
// |  1|Hadoop|   N/A|
// |  2| Spark|  3900|
// |  3| HBase|  4200|
// |  5|Impala|   N/A|
// +---+------+------+

如果您更喜欢使用Spark SQL，下面是一个等效的SQL：

course.createOrReplaceTempView("coursetable")
fee.createOrReplaceTempView("feetable")

val result = spark.sql("""
  select
    c.id, c.course,
    case when f.amount is null then 'N/A' else f.amount end as amount
  from
    coursetable c left outer join feetable f on f.id = c.id
""")

您可以使用

if

、

isnull

函数和N/A literal在简单的sql查询中执行此操作，如下所示

您应该具有以下输出

+---+------+------+
| id|course|amount|
+---+------+------+
|  1|Hadoop|   N/A|
|  2| Spark|  3900|
|  3| HBase|  4200|
|  5|Impala|   N/A|
+---+------+------+

我希望答案有帮助

如果是spark SQL，请使用coalesce UDF

select 
  c.id, 
  c.course, 
  coalesce(f.amount, 'N/A') as amount 
from c 
left outer join f 
on f.id = c.id"

在sqlContext中，使用“NVL”

谢谢你，利奥，这就是我需要的。

select 
  c.id, 
  c.course, 
  coalesce(f.amount, 'N/A') as amount 
from c 
left outer join f 
on f.id = c.id"

sqlContext.sql("""   
   SELECT c.id
      ,c.course
      ,NVL(f.amount, 'N/A')
      FROM course c
      LEFT OUTER
      JOIN fee f 
      ON f.id = c.id
    """).show()