Sql 无顺序保证的条件超前/滞后
如果前面的或进行中的超前/滞后不能保证满足特定条件,那么如何编写条件超前/滞后?就我而言,我关注的是网站流量 示例数据Previor_path和Previor_event是目标字段,在给定条件下,我很难访问这些字段的Previor_eventSql 无顺序保证的条件超前/滞后,sql,hive,hiveql,window-functions,gaps-and-islands,Sql,Hive,Hiveql,Window Functions,Gaps And Islands,如果前面的或进行中的超前/滞后不能保证满足特定条件,那么如何编写条件超前/滞后?就我而言,我关注的是网站流量 示例数据Previor_path和Previor_event是目标字段,在给定条件下,我很难访问这些字段的Previor_event +-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+ |
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
| sessionid | hit | type | path | event | prior_path | prior_event |
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
| 1001 | 1 | event | www.stackoverflow.com | hover | | |
| 1001 | 2 | page | www.stackoverflow.com | | | hover |
| 1001 | 3 | event | www.stackoverflow.com | load | | |
| 1001 | 4 | event | www.stackoverflow.com | blur | | load |
| 1001 | 5 | event | www.stackoverflow.com | click | | blur |
| 1001 | 6 | page | www.stackoverflow.com/post/10 | | www.stackoverflow.com | click |
| 1001 | 7 | event | www.stackoverflow.com/post/10#details | offer | | |
| 1001 | 8 | page | www.stackoverflow.com/post/confirm | | www.stackoverflow.com/post/10 | offer |
| 1001 | 9 | page | www.stackoverflow.com/questions/10 | | www.stackoverflow.com/post/confirm | offer |
| 1001 | 10 | event | www.stackoverflow.com/questions/10 | exit | | |
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
Previor_path:最后一个路径,其中type=page仅适用于页面命中类型
Previor_事件:最后一个事件,其中类型=所有命中类型的事件
注意,对于hit 8和hit 9,offer事件会重复,因为它们会导致这些页面
我能做到的是,前面的路似乎很直
SELECT LAG(path) OVER (PARTITION BY sessionid, type ORDER BY hit) FROM my_table
但我不确定如何获得优先事件。我认为您只需要滞后和一些条件逻辑:
select . . .,
(case when type = 'page'
then lag(path) over (partition by sessionid, type order by hit)
end) as prior_path,
lag(event) over (partition by sessionid order by hit) as prior_event
from my_table;
您已经有了Previor_路径的正确表达式。您只需要将其包装在条件表达式中 至于之前的事件,确实有点复杂。我建议采取以下办法: 对于事件,我们可以使用滞后 对于页面,一个选项是使用一些间隙和孤岛技术:首先使用每次满足事件时递增的条件和定义组,然后使用first_值: 这应该满足您的要求:
select
t.*,
case when type = 'page'
then lag(path) over(partition by sessionid, type order by hit)
end prior_path,
case type
when 'page'
then first_value(event) over(partition by sessionid, grp order by hit)
when 'event'
then lag(event) over(partition by sessionid order by hit)
end prior_event
from (
select
t.*,
sum(case when type = 'event' then 1 else 0 end)
over(partition by sessionid order by hit) grp
from mytable t
) t
由于野外缺少蜂巢提琴,我使用了Postgres——但这也适用于蜂巢:
sessionid | hit | type | path | event | grp | prior_path | prior_event
--------: | --: | :---- | :------------------------------------ | :---- | --: | :--------------------------------- | :----------
1001 | 1 | event | www.stackoverflow.com | hover | 1 | null | null
1001 | 2 | page | www.stackoverflow.com | null | 1 | null | hover
1001 | 3 | event | www.stackoverflow.com | load | 2 | null | null
1001 | 4 | event | www.stackoverflow.com | blur | 3 | null | load
1001 | 5 | event | www.stackoverflow.com | click | 4 | null | blur
1001 | 6 | page | www.stackoverflow.com/post/10 | null | 4 | www.stackoverflow.com | click
1001 | 7 | event | www.stackoverflow.com/post/10#details | offer | 5 | null | null
1001 | 8 | page | www.stackoverflow.com/post/confirm | null | 5 | www.stackoverflow.com/post/10 | offer
1001 | 9 | page | www.stackoverflow.com/questions/10 | null | 5 | www.stackoverflow.com/post/confirm | offer
1001 | 10 | event | www.stackoverflow.com/questions/10 | exit | 6 | null | null
工作起来很有魅力!感谢您对间隙和孤岛技术的解释和参考。这对我来说是一个新的项目,所以我很感激能提供更多的研究参考。有没有办法让这项工作成为page的主要活动?例如,与获得上一次命中8和9的报价类似,是否可以对其进行修改,以将命中10的退出事件作为命中8和9的下一次命中事件?在外部选择中,我尝试了LEAD和LAST_值的组合,但效果不太好。