HiveQL查询以在条件匹配时查找行之间的增量
我在data lake中有一些数据:HiveQL查询以在条件匹配时查找行之间的增量,hive,hiveql,Hive,Hiveql,我在data lake中有一些数据: Person | Date | Time | Number of Friends | Bob | 02/01 | unix_ts1 | 5 | Kate | 02/01 | unix_ts2 | 2 | Jill | 02/01 | unix_ts3 | 3 | Bob | 02/01
Person | Date | Time | Number of Friends |
Bob | 02/01 | unix_ts1 | 5 |
Kate | 02/01 | unix_ts2 | 2 |
Jill | 02/01 | unix_ts3 | 3 |
Bob | 02/01 | unix_ts3 | 7 |
Kate | 02/02 | unix_ts4 | 10 |
Jill | 01/29 | unix_ts0 | 1 |
我想制作一张这样的表格:
Person | Date | Time | Number of Friends DELTA | Found Diff Between
Bob | 02/01 | unix_ts1 | NaN | (5, NaN)
Kate | 02/01 | unix_ts2 | NaN | (2, NaN)
Jill | 02/01 | unix_ts3 | 2 | (3, 1)
Bob | 02/01 | unix_ts3 | 2 | (7, 5)
Kate | 02/02 | unix_ts4 | 8 | (10, 2)
因此,我有一个表,其中每一行都由一个人的名字和记录数据的时间来标识。我想要一个查询,它将查找“Bob”的实例,查找连续时间戳的增量,然后给出增量,以及它找到的两个差值。我希望每个人都能这样
我找到了一个方法,可以在只有一个值的情况下使用lag()命令执行此操作,但该方法无法按人员进行匹配。如果我下载了数据,我也知道如何在熊猫身上做到这一点,但我想知道是否有办法在蜂箱中做到这一点
有办法做到这一点吗?谢谢大家! 使用
lag
窗口功能
select person
,date
,time
,num_friends-lag(num_friends) over(partition by person order by time) as delta
,concat_ws(',',num_friends,lag(num_friends) over(partition by person order by time)) as found_diff_between
from tbl