Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 升级到Spark 2.3时,Spark 2.0 sql引发异常_Apache Spark_Apache Spark Sql - Fatal编程技术网

Apache spark 升级到Spark 2.3时,Spark 2.0 sql引发异常

Apache spark 升级到Spark 2.3时,Spark 2.0 sql引发异常,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我找不到任何关于Spark 2.3向后兼容性的文档 目前有一些Spark查询可以在Spark 2.0中成功运行,但在Spark 2.3中出现以下异常。它似乎与同一个表中的嵌套查询有关,但不确定define alias为什么没有解析它 val dataTable = spark.sql("select * from mydb.mytable a where a.version_no in (select cast(max(cast(b.version_no as int)) as string)

我找不到任何关于Spark 2.3向后兼容性的文档

目前有一些Spark查询可以在Spark 2.0中成功运行,但在Spark 2.3中出现以下异常。它似乎与同一个表中的嵌套查询有关,但不确定define alias为什么没有解析它

val dataTable = spark.sql("select * from mydb.mytable a where a.version_no in (select cast(max(cast(b.version_no as int)) as string) as version_no from mydb.mytable b)")
val lastUpdateDate = dataTable.select("value").where(dataTable("item") <=> "lastUpdateDate").rdd.map(_.getString(0)).toLocalIterator.toList.head
例外是

Exception in thread "main" org.apache.spark.sql.AnalysisException: Resolved attribute(s) item#15 missing from value#16 in oper
!Filter (item#15 <=> lastUpdateDate)                                                                                                
+- AnalysisBarrier                                                                                                            
  +- Project [value#16]                                                                                                   
     +- Project [item#15, value#16, description#17, version_no#18, name#19, date#20]                       
        +- Filter version_no#18 IN (list#14 [])                                                                           
           :  +- Aggregate [cast(max(cast(version_no#24 as int)) as string) AS version_no#13]                             
           :     +- SubqueryAlias b                                                                                       
           :        +- SubqueryAlias mytable                                                          
           :           +- HiveTableRelation `mydb`.`mytable`, org.apache.hadoop.hive.serde2.OpenCSVSerd
]                                                                                                                             
           +- SubqueryAlias a                                                                                             
              +- SubqueryAlias mytable                                                                
                 +- HiveTableRelation `mydb`.`mytable`, org.apache.hadoop.hive.serde2.OpenCSVSerde, [it

您需要在代码中反转过滤器和投影

试试这个:

val lastUpdateDate = dataTable.where(dataTable("item") <=> "lastUpdateDate").select("value").rdd.map(_.getString(0)).first

您仅选择值字段,然后尝试筛选其他字段项。不能工作,这是有道理的。您需要在选择值之前进行筛选column@baitmbarek谢谢那么在选择值之前筛选器应该在哪里?请尝试以下操作:val lastUpdateDate=dataTable.wheredataTableitem lastUpdateDate.selectvalue.rdd.map\uU2.getString0。first@baitmbarek这很有效,非常感谢!所以sparksql应该以与常规SQL相反的顺序编写,过滤代码首先是where子句,然后是projectselect。请把你的答案作为答案,我将结束这个问题。