Sql 重写此查询时需要帮助吗
我们每天都在生产中运行此查询 它进行了大量的连接,还使用了配置单元中的窗口函数 我们试图添加一些设置选项,但没有多大帮助 结构是这样的-Sql 重写此查询时需要帮助吗,sql,hive,query-optimization,hiveql,Sql,Hive,Query Optimization,Hiveql,我们每天都在生产中运行此查询 它进行了大量的连接,还使用了配置单元中的窗口函数 我们试图添加一些设置选项,但没有多大帮助 结构是这样的- SELECT C.f1, C.f2, A.f2 ... FROM ( SELECT * FROM ( SELECT T1.*, B.atid, B.a_id, ROW_NUMBER() OVER (PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC) AS
SELECT
C.f1, C.f2, A.f2 ...
FROM (
SELECT * FROM (
SELECT T1.*, B.atid, B.a_id,
ROW_NUMBER() OVER (PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC) AS RANK_
FROM T1 AS T1
JOIN T5 ON T1.t_dt = T5.t_dt
JOIN T2 B ON T1.wtid = B.wtid and T1.b_ts = B.b_ts
LEFT OUTER JOIN (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
ON T1.TYP = PV.p_cd
WHERE T1.state not in ("INVALID")
AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
AND ISNULL(PV.p_cd)
) T
WHERE T.rank_ = 1
) A
JOIN (SELECT *, row_number() over (partition by ac_id order by b_ts desc) rank_
FROM T4
WHERE event not in ('CT','UPD')
) AS C
ON A.a_id = C.a_id
AND A.atid = C.ac_id
AND C.rank_ = 1
JOIN T6 ON C.t_dt = T6.t_dt
- 由于我不能忽略任何表(和联接),我的方法是使用聚合函数max用另一个联接替换窗口函数,但我无法重写它李>
- 此外,我不确定这是否一定会有助于提高绩效,因此任何指导都会对我们有所帮助
ISNULL(PV.p_cd)
减少了T1中的一些行。
在这些条件下,情况也是如此:
WHERE T1.state not in ("INVALID")
AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
将此联接移动到子查询中,如果它过滤lo,这可能有助于在所有其他联接和行数()之前减少T1中的数据集:
此外,第一行_数仅在T1和B表上计算:
PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC
考虑在row_number filter之后连接T5表,如果此连接很重,并且row_number filter正在减少数据集,则再次在子查询中使用filter包装row_number,并连接使用T5过滤的子查询
(--filtered by row_number
select * from
(
SELECT T1.*, B.atid, B.a_id,
ROW_NUMBER() OVER (PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC) AS RANK_
from
(select T1.* from T1
left join (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
ON T1.TYP = PV.p_cd
where T1.state not in ("INVALID")
AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
AND ISNULL(PV.p_cd)
) as T1 JOIN T2 B ON T1.wtid = B.wtid and T1.b_ts = B.b_ts
) T WHERE T.rank_ = 1
) T --filtered
JOIN T5 ON T1.t_dt = T5.t_d
这可能会有所帮助,具体取决于您的数据
也请阅读:这是:如果您没有在Tez上运行它,请尝试。查询越复杂,Tez的改进就越多。如果无法改善查询,请调整并行性:
PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC
(--filtered by row_number
select * from
(
SELECT T1.*, B.atid, B.a_id,
ROW_NUMBER() OVER (PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC) AS RANK_
from
(select T1.* from T1
left join (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
ON T1.TYP = PV.p_cd
where T1.state not in ("INVALID")
AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
AND ISNULL(PV.p_cd)
) as T1 JOIN T2 B ON T1.wtid = B.wtid and T1.b_ts = B.b_ts
) T WHERE T.rank_ = 1
) T --filtered
JOIN T5 ON T1.t_dt = T5.t_d