Sql 在presto中优化窗口查询
我有一个表,其中包含诸如user_id、col1、col2、col3、updated_at、is_deleted、day等字段 当前查询如下所示-Sql 在presto中优化窗口查询,sql,bigdata,presto,Sql,Bigdata,Presto,我有一个表,其中包含诸如user_id、col1、col2、col3、updated_at、is_deleted、day等字段 当前查询如下所示- SELECT DISTINCT user_id, first_value(col1) ignore nulls OVER (partition BY user_id ORDER BY updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED follow
SELECT DISTINCT
user_id,
first_value(col1) ignore nulls OVER (partition BY user_id
ORDER BY
updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS col1,
first_value(col2) ignore nulls OVER (partition BY user_id
ORDER BY
updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS col2,
first_value(col3) ignore nulls OVER (partition BY user_id
ORDER BY
updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS col3,
bool_or(is_deleted) ignore nulls OVER (partition BY user_id
ORDER BY
updated_at DESC rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following) AS is_deleted
FROM
my_table
WHERE
day >= '2021-05-25'
基本上,对于每个用户id,我需要每个列的最新(第一个)值。由于每个值列都可以为null,因此我必须多次运行相同的窗口查询(对于每个列)。
目前,66%的时间都花在窗口上。
有什么优化的方法吗?似乎您想要这样:
select * from (
select * , row_number() over (partition by user_id ORDER BY updated_at DESC) rn
from my_table
where day >= '2021-05-25'
) t
where rn = 1
它可以返回OP希望避免的空值。