Apache spark 了解计划树字符串表示法
我有一个简单的连接查询:Apache spark 了解计划树字符串表示法,apache-spark,Apache Spark,我有一个简单的连接查询: test("SparkSQLTest 0005") { val spark = SparkSession.builder().master("local").appName("SparkSQLTest 0005").getOrCreate() spark.range(100, 100000).createOrReplaceTempView("t1") spark.range(2000, 10000).createOrReplaceTempVi
test("SparkSQLTest 0005") {
val spark = SparkSession.builder().master("local").appName("SparkSQLTest 0005").getOrCreate()
spark.range(100, 100000).createOrReplaceTempView("t1")
spark.range(2000, 10000).createOrReplaceTempView("t2")
val df = spark.sql("select count(1) from t1 join t2 on t1.id = t2.id")
df.explain(true)
}
结果如下:
我在输出中问了5个标记为Q0~Q4的问题,有人能帮我解释一下吗?谢谢
== Parsed Logical Plan ==
'Project [unresolvedalias('count(1), None)] //Q0, Why the first line has no +- or :-
+- 'Join Inner, ('t1.id = 't2.id) //Q1, What does +- mean
:- 'UnresolvedRelation `t1` //Q2 What does :- mean
+- 'UnresolvedRelation `t2`
== Analyzed Logical Plan ==
count(1): bigint
Aggregate [count(1) AS count(1)#9L]
+- Join Inner, (id#0L = id#2L)
:- SubqueryAlias t1
: +- Range (100, 100000, step=1, splits=Some(1)) //Q3 What does : +- mean?
+- SubqueryAlias t2
+- Range (2000, 10000, step=1, splits=Some(1))
== Optimized Logical Plan ==
Aggregate [count(1) AS count(1)#9L]
+- Project
+- Join Inner, (id#0L = id#2L)
:- Range (100, 100000, step=1, splits=Some(1)) //Q4 These two Ranges are both Join's children, why one is :- and the other is +-
+- Range (2000, 10000, step=1, splits=Some(1)) //Q4
== Physical Plan ==
*(2) HashAggregate(keys=[], functions=[count(1)], output=[count(1)#9L])
+- *(2) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#11L])
+- *(2) Project
+- *(2) BroadcastHashJoin [id#0L], [id#2L], Inner, BuildRight
:- *(2) Range (100, 100000, step=1, splits=1)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *(1) Range (2000, 10000, step=1, splits=1)
它们是简单表示有序嵌套操作的要点
- 标题
- 儿童1
- 孙女1
- 儿童2
- 孙女2
- 孙女3
- 儿童3
- 儿童1
Header
:- Child 1
: +- Grandchild 1
:- Child 2
: :- Grandchild 2
: +- Grandchild 3
+- Child 3
直接子对象,通常是最后一个+-
直接子代的兄弟姐妹,但不是最后一个:-
最后一个孙子,其父母有兄弟姐妹:+-
有兄弟姐妹的孙子女,其父母是非最终子女,也有兄弟姐妹::-
- 标题
- 儿童1
- 孙女1
- 儿童2
- 孙女2
- 孙女3
- 儿童3
- 儿童1
Header
:- Child 1
: +- Grandchild 1
:- Child 2
: :- Grandchild 2
: +- Grandchild 3
+- Child 3
直接子对象,通常是最后一个+-
直接子代的兄弟姐妹,但不是最后一个:-
最后一个孙子,其父母有兄弟姐妹:+-
有兄弟姐妹的孙子女,其父母是非最终子女,也有兄弟姐妹::-