Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/74.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 无顺序保证的条件超前/滞后_Sql_Hive_Hiveql_Window Functions_Gaps And Islands - Fatal编程技术网

Sql 无顺序保证的条件超前/滞后

Sql 无顺序保证的条件超前/滞后,sql,hive,hiveql,window-functions,gaps-and-islands,Sql,Hive,Hiveql,Window Functions,Gaps And Islands,如果前面的或进行中的超前/滞后不能保证满足特定条件,那么如何编写条件超前/滞后?就我而言,我关注的是网站流量 示例数据Previor_path和Previor_event是目标字段,在给定条件下,我很难访问这些字段的Previor_event +-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+ |

如果前面的或进行中的超前/滞后不能保证满足特定条件,那么如何编写条件超前/滞后?就我而言,我关注的是网站流量

示例数据Previor_path和Previor_event是目标字段,在给定条件下,我很难访问这些字段的Previor_event

+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
| sessionid | hit | type  |                 path                  | event |             prior_path             | prior_event |
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
|      1001 |   1 | event | www.stackoverflow.com                 | hover |                                    |             |
|      1001 |   2 | page  | www.stackoverflow.com                 |       |                                    | hover       |
|      1001 |   3 | event | www.stackoverflow.com                 | load  |                                    |             |
|      1001 |   4 | event | www.stackoverflow.com                 | blur  |                                    | load        |
|      1001 |   5 | event | www.stackoverflow.com                 | click |                                    | blur        |
|      1001 |   6 | page  | www.stackoverflow.com/post/10         |       | www.stackoverflow.com              | click       |
|      1001 |   7 | event | www.stackoverflow.com/post/10#details | offer |                                    |             |
|      1001 |   8 | page  | www.stackoverflow.com/post/confirm    |       | www.stackoverflow.com/post/10      | offer       |
|      1001 |   9 | page  | www.stackoverflow.com/questions/10    |       | www.stackoverflow.com/post/confirm | offer       |
|      1001 |  10 | event | www.stackoverflow.com/questions/10    | exit  |                                    |             |
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
Previor_path:最后一个路径,其中type=page仅适用于页面命中类型 Previor_事件:最后一个事件,其中类型=所有命中类型的事件

注意,对于hit 8和hit 9,offer事件会重复,因为它们会导致这些页面

我能做到的是,前面的路似乎很直

SELECT LAG(path) OVER (PARTITION BY sessionid, type ORDER BY hit) FROM my_table
但我不确定如何获得优先事件。

我认为您只需要滞后和一些条件逻辑:

select . . .,
       (case when type = 'page'
             then lag(path) over (partition by sessionid, type order by hit)
        end) as prior_path,
       lag(event) over (partition by sessionid order by hit) as prior_event
from my_table;

您已经有了Previor_路径的正确表达式。您只需要将其包装在条件表达式中

至于之前的事件,确实有点复杂。我建议采取以下办法:

对于事件,我们可以使用滞后

对于页面,一个选项是使用一些间隙和孤岛技术:首先使用每次满足事件时递增的条件和定义组,然后使用first_值:

这应该满足您的要求:

select  
    t.*,
    case when type = 'page'
        then lag(path) over(partition by sessionid, type  order by hit)
    end prior_path,
    case type 
        when 'page'
            then first_value(event) over(partition by sessionid, grp order by hit)
        when 'event' 
            then lag(event) over(partition by sessionid order by hit)
        end prior_event
from (
    select 
        t.*,
        sum(case when type = 'event' then 1 else 0 end) 
            over(partition by sessionid order by hit) grp
    from mytable t
) t
由于野外缺少蜂巢提琴,我使用了Postgres——但这也适用于蜂巢:

sessionid | hit | type | path | event | grp | prior_path | prior_event --------: | --: | :---- | :------------------------------------ | :---- | --: | :--------------------------------- | :---------- 1001 | 1 | event | www.stackoverflow.com | hover | 1 | null | null 1001 | 2 | page | www.stackoverflow.com | null | 1 | null | hover 1001 | 3 | event | www.stackoverflow.com | load | 2 | null | null 1001 | 4 | event | www.stackoverflow.com | blur | 3 | null | load 1001 | 5 | event | www.stackoverflow.com | click | 4 | null | blur 1001 | 6 | page | www.stackoverflow.com/post/10 | null | 4 | www.stackoverflow.com | click 1001 | 7 | event | www.stackoverflow.com/post/10#details | offer | 5 | null | null 1001 | 8 | page | www.stackoverflow.com/post/confirm | null | 5 | www.stackoverflow.com/post/10 | offer 1001 | 9 | page | www.stackoverflow.com/questions/10 | null | 5 | www.stackoverflow.com/post/confirm | offer 1001 | 10 | event | www.stackoverflow.com/questions/10 | exit | 6 | null | null
工作起来很有魅力!感谢您对间隙和孤岛技术的解释和参考。这对我来说是一个新的项目,所以我很感激能提供更多的研究参考。有没有办法让这项工作成为page的主要活动?例如,与获得上一次命中8和9的报价类似,是否可以对其进行修改,以将命中10的退出事件作为命中8和9的下一次命中事件?在外部选择中,我尝试了LEAD和LAST_值的组合,但效果不太好。