Apache spark 错误:类型不匹配::找到:布尔::必需:org.apache.spark.sql.Column问题::spark/Scala

Apache spark 错误:类型不匹配::找到:布尔::必需:org.apache.spark.sql.Column问题::spark/Scala,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我在spark Df1和Df2中有两个数据帧,我基于一个公共列(即Id)将这两个数据帧连接起来,然后添加一个额外的列结果并进行检查 具有或条件的多个列如果任何列数据匹配,则我需要在新列中插入匹配的列,如果没有匹配的条件,则需要在该列中作为“不匹配”传递。我正在写下面的代码 df1.join(df2,df1("id") === df2("id")) .withColumn("Result",when(df1("adhar_no") === df2("adhar_no") ||

我在spark Df1和Df2中有两个数据帧,我基于一个公共列(即Id)将这两个数据帧连接起来,然后添加一个额外的列结果并进行检查 具有或条件的多个列如果任何列数据匹配,则我需要在新列中插入匹配的列,如果没有匹配的条件,则需要在该列中作为“不匹配”传递。我正在写下面的代码

    df1.join(df2,df1("id") === df2("id"))
    .withColumn("Result",when(df1("adhar_no") === df2("adhar_no") || 
    when(df1("pan_no") === df2("pan_no") || 
    when(df1("Voter_id") === df2("Voter_id") ||  
    when(df1("DL_no") === df2("DL_no"),"Matched"))))
   .otherwise("Not Matched"))
df1.join(df2,df1("id") === df2("id"))
    .withColumn("Result",
    when((
            df1("adhar_no") === df2("adhar_no") || 
            df1("pan_no") === df2("pan_no") || 
            df1("Voter_id") === df2("Voter_id") ||  
            df1("DL_no") === df2("DL_no")
        ),"Matched"
    ).otherwise("Not Matched")
)

但下面是错误消息

    error: type mismatch;
    found   : Boolean
    required: org.apache.spark.sql.Column
任何人都可以给我一个提示,我应该如何编写查询以生成所需的输出。

您应该删除以下语句:

  df.join(df,df("id") === df("id"))
    .withColumn("Result",
      when(
        df("adhar_no") === df("adhar_no") || 
        df("pan_no") === df("pan_no") || 
        df("Voter_id") === df("Voter_id") || 
        df("DL_no") === df("DL_no"),"Matched"
      ).otherwise("Not Matched"))
试试下面的代码

    df1.join(df2,df1("id") === df2("id"))
    .withColumn("Result",when(df1("adhar_no") === df2("adhar_no") || 
    when(df1("pan_no") === df2("pan_no") || 
    when(df1("Voter_id") === df2("Voter_id") ||  
    when(df1("DL_no") === df2("DL_no"),"Matched"))))
   .otherwise("Not Matched"))
df1.join(df2,df1("id") === df2("id"))
    .withColumn("Result",
    when((
            df1("adhar_no") === df2("adhar_no") || 
            df1("pan_no") === df2("pan_no") || 
            df1("Voter_id") === df2("Voter_id") ||  
            df1("DL_no") === df2("DL_no")
        ),"Matched"
    ).otherwise("Not Matched")
)


谢谢@chlebek..是的,这对我很有效,事实上我在代码中也使用了drop方法,在一个地方,我对dfpan_no使用了drop方法,所以出现了错误。