Apache spark 错误:类型不匹配::找到:布尔::必需:org.apache.spark.sql.Column问题::spark/Scala
我在spark Df1和Df2中有两个数据帧,我基于一个公共列(即Id)将这两个数据帧连接起来,然后添加一个额外的列结果并进行检查 具有或条件的多个列如果任何列数据匹配,则我需要在新列中插入匹配的列,如果没有匹配的条件,则需要在该列中作为“不匹配”传递。我正在写下面的代码Apache spark 错误:类型不匹配::找到:布尔::必需:org.apache.spark.sql.Column问题::spark/Scala,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我在spark Df1和Df2中有两个数据帧,我基于一个公共列(即Id)将这两个数据帧连接起来,然后添加一个额外的列结果并进行检查 具有或条件的多个列如果任何列数据匹配,则我需要在新列中插入匹配的列,如果没有匹配的条件,则需要在该列中作为“不匹配”传递。我正在写下面的代码 df1.join(df2,df1("id") === df2("id")) .withColumn("Result",when(df1("adhar_no") === df2("adhar_no") ||
df1.join(df2,df1("id") === df2("id"))
.withColumn("Result",when(df1("adhar_no") === df2("adhar_no") ||
when(df1("pan_no") === df2("pan_no") ||
when(df1("Voter_id") === df2("Voter_id") ||
when(df1("DL_no") === df2("DL_no"),"Matched"))))
.otherwise("Not Matched"))
df1.join(df2,df1("id") === df2("id"))
.withColumn("Result",
when((
df1("adhar_no") === df2("adhar_no") ||
df1("pan_no") === df2("pan_no") ||
df1("Voter_id") === df2("Voter_id") ||
df1("DL_no") === df2("DL_no")
),"Matched"
).otherwise("Not Matched")
)
但下面是错误消息
error: type mismatch;
found : Boolean
required: org.apache.spark.sql.Column
任何人都可以给我一个提示,我应该如何编写查询以生成所需的输出。您应该删除以下语句:
df.join(df,df("id") === df("id"))
.withColumn("Result",
when(
df("adhar_no") === df("adhar_no") ||
df("pan_no") === df("pan_no") ||
df("Voter_id") === df("Voter_id") ||
df("DL_no") === df("DL_no"),"Matched"
).otherwise("Not Matched"))
试试下面的代码
df1.join(df2,df1("id") === df2("id"))
.withColumn("Result",when(df1("adhar_no") === df2("adhar_no") ||
when(df1("pan_no") === df2("pan_no") ||
when(df1("Voter_id") === df2("Voter_id") ||
when(df1("DL_no") === df2("DL_no"),"Matched"))))
.otherwise("Not Matched"))
df1.join(df2,df1("id") === df2("id"))
.withColumn("Result",
when((
df1("adhar_no") === df2("adhar_no") ||
df1("pan_no") === df2("pan_no") ||
df1("Voter_id") === df2("Voter_id") ||
df1("DL_no") === df2("DL_no")
),"Matched"
).otherwise("Not Matched")
)
谢谢@chlebek..是的,这对我很有效,事实上我在代码中也使用了drop方法,在一个地方,我对dfpan_no使用了drop方法,所以出现了错误。