Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark SQL:Catalyst正在扫描不需要的列_Scala_Apache Spark_Apache Spark Sql_Spark Dataframe - Fatal编程技术网

Scala Spark SQL:Catalyst正在扫描不需要的列

Scala Spark SQL:Catalyst正在扫描不需要的列,scala,apache-spark,apache-spark-sql,spark-dataframe,Scala,Apache Spark,Apache Spark Sql,Spark Dataframe,我有两个场景,如下所示: scala> val dfA = sqlContext.read.parquet("/home/mohit/ruleA") dfA: org.apache.spark.sql.DataFrame = [aid: int, aVal: string] scala> val dfB = sqlContext.read.parquet("/home/mohit/ruleB") dfB: org.apache.spark.sql.DataFrame = [bid:

我有两个场景,如下所示:

scala> val dfA = sqlContext.read.parquet("/home/mohit/ruleA")
dfA: org.apache.spark.sql.DataFrame = [aid: int, aVal: string]

scala> val dfB = sqlContext.read.parquet("/home/mohit/ruleB")
dfB: org.apache.spark.sql.DataFrame = [bid: int, bVal: string]

scala> dfA.registerTempTable("A")

scala> dfB.registerTempTable("B")
1.在其中使用过滤器进行左连接

sqlContext.sql("select A.aid, B.bid from A left join B on A.aid=B.bid where B.bid<2").explain

== Physical Plan ==
Project [aid#15,bid#17]
+- Filter (bid#17 < 2)
   +- BroadcastHashOuterJoin [aid#15], [bid#17], LeftOuter, None
      :- Scan ParquetRelation[aid#15,aVal#16] InputPaths: file:/home/mohit/ruleA
      +- Scan ParquetRelation[bid#17,bVal#18] InputPaths: file:/home/mohit/ruleB

sqlContext.sql(“选择A.aid,B.bid从左边加入A.aid=B.bid,其中B.bid我在Spark上提出了这个问题。这是Spark 1.6中的一个真正的错误。

它看起来像一个错误-如果没有人会在这里帮助你,那么在Spark Developer group上发布将是一个好主意
sqlContext.sql("select A.aid, B.bid from A left join B on A.aid=B.bid and B.bid<2").explain

== Physical Plan ==
Project [aid#15,bid#17]
+- BroadcastHashOuterJoin [aid#15], [bid#17], LeftOuter, None
   :- Scan ParquetRelation[aid#15] InputPaths: file:/home/mohit/ruleA
   +- Filter (bid#17 < 2)
      +- Scan ParquetRelation[bid#17] InputPaths: file:/home/mohit/ruleB, PushedFilters: [LessThan(bid,2)]