Hadoop 蜂巢侧视图使用where子句分解-首先运行的是什么
我试图理解WHERE子句是在蜂箱中的横向视图爆炸之后还是之前运行 例如,如果我有Hadoop 蜂巢侧视图使用where子句分解-首先运行的是什么,hadoop,hive,hiveql,hadoop2,Hadoop,Hive,Hiveql,Hadoop2,我试图理解WHERE子句是在蜂箱中的横向视图爆炸之后还是之前运行 例如,如果我有 SELECT * FROM ( SELECT a1, a2, b.ds, conv_list.threshold_conv[0] AS t FROM t1 b LATERAL VIEW EXPLODE({list}) conv_list as th
SELECT *
FROM
(
SELECT
a1,
a2,
b.ds,
conv_list.threshold_conv[0]
AS t
FROM
t1 b
LATERAL VIEW EXPLODE({list})
conv_list as threshold_conv
WHERE
b.ds between '{DATE-29}' and '{DATE}'
)
ds过滤器是在侧面视图爆炸之前还是之后运行?- 如果筛选列是表中的一个分区,这就是分区的主要目的,即使where子句不在子查询中(谓词下推)
- 横向视图有时可能是一个昂贵的操作,因此在应用横向视图之前,请参阅以下基于您的查询的执行计划(不同)
- 现在,如果您的过滤器使用分解数组中的一些字段,我假设Hive将尝试应用所有可能的过滤器,这些过滤器在应用横向视图之前不使用分解数据中的任何列,然后对分解数据应用过滤器
STAGE PLANS: Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: a
filterExpr: ((mycolumndpartitioned > 0) and (mycolumn= 112623934)) (type: boolean)
Statistics: Num rows: 23953585 Data size: 52793067242 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (mycolumn= 112623934) (type: boolean)
Statistics: Num rows: 11976792 Data size: 26396532519 Basic stats: COMPLETE Column stats: NONE
Lateral View Forward
Statistics: Num rows: 11976792 Data size: 26396532519 Basic stats: COMPLETE Column stats: NONE
Select Operator
Statistics: Num rows: 11976792 Data size: 26396532519 Basic stats: COMPLETE Column stats: NONE
Lateral View Join Operator
outputColumnNames: _col13
Statistics: Num rows: 23953584 Data size: 52793065038 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col13.myArray (type: string)
outputColumnNames: _col0
Statistics: Num rows: 23953584 Data size: 52793065038 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 23953584 Data size: 52793065038 Basic stats: COMPLETE Column stats: NONE