Apache spark 升级到Spark 2.3时,Spark 2.0 sql引发异常
我找不到任何关于Spark 2.3向后兼容性的文档 目前有一些Spark查询可以在Spark 2.0中成功运行,但在Spark 2.3中出现以下异常。它似乎与同一个表中的嵌套查询有关,但不确定define alias为什么没有解析它Apache spark 升级到Spark 2.3时,Spark 2.0 sql引发异常,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我找不到任何关于Spark 2.3向后兼容性的文档 目前有一些Spark查询可以在Spark 2.0中成功运行,但在Spark 2.3中出现以下异常。它似乎与同一个表中的嵌套查询有关,但不确定define alias为什么没有解析它 val dataTable = spark.sql("select * from mydb.mytable a where a.version_no in (select cast(max(cast(b.version_no as int)) as string)
val dataTable = spark.sql("select * from mydb.mytable a where a.version_no in (select cast(max(cast(b.version_no as int)) as string) as version_no from mydb.mytable b)")
val lastUpdateDate = dataTable.select("value").where(dataTable("item") <=> "lastUpdateDate").rdd.map(_.getString(0)).toLocalIterator.toList.head
例外是
Exception in thread "main" org.apache.spark.sql.AnalysisException: Resolved attribute(s) item#15 missing from value#16 in oper
!Filter (item#15 <=> lastUpdateDate)
+- AnalysisBarrier
+- Project [value#16]
+- Project [item#15, value#16, description#17, version_no#18, name#19, date#20]
+- Filter version_no#18 IN (list#14 [])
: +- Aggregate [cast(max(cast(version_no#24 as int)) as string) AS version_no#13]
: +- SubqueryAlias b
: +- SubqueryAlias mytable
: +- HiveTableRelation `mydb`.`mytable`, org.apache.hadoop.hive.serde2.OpenCSVSerd
]
+- SubqueryAlias a
+- SubqueryAlias mytable
+- HiveTableRelation `mydb`.`mytable`, org.apache.hadoop.hive.serde2.OpenCSVSerde, [it
您需要在代码中反转过滤器和投影 试试这个:
val lastUpdateDate = dataTable.where(dataTable("item") <=> "lastUpdateDate").select("value").rdd.map(_.getString(0)).first
您仅选择值字段,然后尝试筛选其他字段项。不能工作,这是有道理的。您需要在选择值之前进行筛选column@baitmbarek谢谢那么在选择值之前筛选器应该在哪里?请尝试以下操作:val lastUpdateDate=dataTable.wheredataTableitem lastUpdateDate.selectvalue.rdd.map\uU2.getString0。first@baitmbarek这很有效,非常感谢!所以sparksql应该以与常规SQL相反的顺序编写,过滤代码首先是where子句,然后是projectselect。请把你的答案作为答案,我将结束这个问题。