Apache spark 了解计划树字符串表示法

Apache spark 了解计划树字符串表示法,apache-spark,Apache Spark,我有一个简单的连接查询: test("SparkSQLTest 0005") { val spark = SparkSession.builder().master("local").appName("SparkSQLTest 0005").getOrCreate() spark.range(100, 100000).createOrReplaceTempView("t1") spark.range(2000, 10000).createOrReplaceTempVi

我有一个简单的连接查询:

  test("SparkSQLTest 0005") {
    val spark = SparkSession.builder().master("local").appName("SparkSQLTest 0005").getOrCreate()
    spark.range(100, 100000).createOrReplaceTempView("t1")
    spark.range(2000, 10000).createOrReplaceTempView("t2")
    val df = spark.sql("select count(1) from t1 join t2 on t1.id = t2.id")
    df.explain(true)
  }
结果如下:

我在输出中问了5个标记为Q0~Q4的问题,有人能帮我解释一下吗?谢谢

== Parsed Logical Plan ==
'Project [unresolvedalias('count(1), None)] //Q0, Why the first line has no +- or :-
+- 'Join Inner, ('t1.id = 't2.id)    //Q1, What does +- mean
   :- 'UnresolvedRelation `t1`       //Q2 What does :- mean
   +- 'UnresolvedRelation `t2`

== Analyzed Logical Plan ==
count(1): bigint
Aggregate [count(1) AS count(1)#9L]
+- Join Inner, (id#0L = id#2L)
   :- SubqueryAlias t1
   :  +- Range (100, 100000, step=1, splits=Some(1)) //Q3 What does :  +- mean?
   +- SubqueryAlias t2
      +- Range (2000, 10000, step=1, splits=Some(1))

== Optimized Logical Plan ==
Aggregate [count(1) AS count(1)#9L]
+- Project
   +- Join Inner, (id#0L = id#2L)
      :- Range (100, 100000, step=1, splits=Some(1)) //Q4 These two Ranges are both Join's children, why one is :- and the other is +-
      +- Range (2000, 10000, step=1, splits=Some(1)) //Q4

== Physical Plan ==
*(2) HashAggregate(keys=[], functions=[count(1)], output=[count(1)#9L])
+- *(2) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#11L])
   +- *(2) Project
      +- *(2) BroadcastHashJoin [id#0L], [id#2L], Inner, BuildRight
         :- *(2) Range (100, 100000, step=1, splits=1)
         +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
            +- *(1) Range (2000, 10000, step=1, splits=1)

它们是简单表示有序嵌套操作的要点

  • 标题
    • 儿童1
      • 孙女1
    • 儿童2
      • 孙女2
      • 孙女3
    • 儿童3
将写为

Header
:- Child 1
:  +- Grandchild 1
:- Child 2
:  :- Grandchild 2
:  +- Grandchild 3
+- Child 3
  • +-
    直接子对象,通常是最后一个
  • :-
    直接子代的兄弟姐妹,但不是最后一个
  • :+-
    最后一个孙子,其父母有兄弟姐妹
  • ::-
    有兄弟姐妹的孙子女,其父母是非最终子女,也有兄弟姐妹

它们只是表示有序嵌套操作的要点

  • 标题
    • 儿童1
      • 孙女1
    • 儿童2
      • 孙女2
      • 孙女3
    • 儿童3
将写为

Header
:- Child 1
:  +- Grandchild 1
:- Child 2
:  :- Grandchild 2
:  +- Grandchild 3
+- Child 3
  • +-
    直接子对象,通常是最后一个
  • :-
    直接子代的兄弟姐妹,但不是最后一个
  • :+-
    最后一个孙子,其父母有兄弟姐妹
  • ::-
    有兄弟姐妹的孙子女,其父母是非最终子女,也有兄弟姐妹