Scala Spark SQL:Catalyst正在扫描不需要的列
我有两个场景,如下所示:Scala Spark SQL:Catalyst正在扫描不需要的列,scala,apache-spark,apache-spark-sql,spark-dataframe,Scala,Apache Spark,Apache Spark Sql,Spark Dataframe,我有两个场景,如下所示: scala> val dfA = sqlContext.read.parquet("/home/mohit/ruleA") dfA: org.apache.spark.sql.DataFrame = [aid: int, aVal: string] scala> val dfB = sqlContext.read.parquet("/home/mohit/ruleB") dfB: org.apache.spark.sql.DataFrame = [bid:
scala> val dfA = sqlContext.read.parquet("/home/mohit/ruleA")
dfA: org.apache.spark.sql.DataFrame = [aid: int, aVal: string]
scala> val dfB = sqlContext.read.parquet("/home/mohit/ruleB")
dfB: org.apache.spark.sql.DataFrame = [bid: int, bVal: string]
scala> dfA.registerTempTable("A")
scala> dfB.registerTempTable("B")
1.在其中使用过滤器进行左连接
sqlContext.sql("select A.aid, B.bid from A left join B on A.aid=B.bid where B.bid<2").explain
== Physical Plan ==
Project [aid#15,bid#17]
+- Filter (bid#17 < 2)
+- BroadcastHashOuterJoin [aid#15], [bid#17], LeftOuter, None
:- Scan ParquetRelation[aid#15,aVal#16] InputPaths: file:/home/mohit/ruleA
+- Scan ParquetRelation[bid#17,bVal#18] InputPaths: file:/home/mohit/ruleB
sqlContext.sql(“选择A.aid,B.bid从左边加入A.aid=B.bid,其中B.bid我在Spark上提出了这个问题。这是Spark 1.6中的一个真正的错误。它看起来像一个错误-如果没有人会在这里帮助你,那么在Spark Developer group上发布将是一个好主意
sqlContext.sql("select A.aid, B.bid from A left join B on A.aid=B.bid and B.bid<2").explain
== Physical Plan ==
Project [aid#15,bid#17]
+- BroadcastHashOuterJoin [aid#15], [bid#17], LeftOuter, None
:- Scan ParquetRelation[aid#15] InputPaths: file:/home/mohit/ruleA
+- Filter (bid#17 < 2)
+- Scan ParquetRelation[bid#17] InputPaths: file:/home/mohit/ruleB, PushedFilters: [LessThan(bid,2)]