HiveQL查询以在条件匹配时查找行之间的增量

HiveQL查询以在条件匹配时查找行之间的增量,hive,hiveql,Hive,Hiveql,我在data lake中有一些数据: Person | Date | Time | Number of Friends | Bob | 02/01 | unix_ts1 | 5 | Kate | 02/01 | unix_ts2 | 2 | Jill | 02/01 | unix_ts3 | 3 | Bob | 02/01

我在data lake中有一些数据:

Person |  Date    |  Time    |  Number of Friends  |  
Bob    |  02/01   | unix_ts1 |  5                  |
Kate   |  02/01   | unix_ts2 |  2                  |
Jill   |  02/01   | unix_ts3 |  3                  |
Bob    |  02/01   | unix_ts3 |  7                  |
Kate   |  02/02   | unix_ts4 |  10                 |
Jill   |  01/29   | unix_ts0 |  1                  |
我想制作一张这样的表格:

Person |  Date    |  Time    |  Number of Friends DELTA  | Found Diff Between
Bob    |  02/01   | unix_ts1 |  NaN                      | (5, NaN)
Kate   |  02/01   | unix_ts2 |  NaN                      | (2, NaN)
Jill   |  02/01   | unix_ts3 |  2                        | (3, 1)
Bob    |  02/01   | unix_ts3 |  2                        | (7, 5)
Kate   |  02/02   | unix_ts4 |  8                        | (10, 2)
因此,我有一个表,其中每一行都由一个人的名字和记录数据的时间来标识。我想要一个查询,它将查找“Bob”的实例,查找连续时间戳的增量,然后给出增量,以及它找到的两个差值。我希望每个人都能这样

我找到了一个方法,可以在只有一个值的情况下使用lag()命令执行此操作,但该方法无法按人员进行匹配。如果我下载了数据,我也知道如何在熊猫身上做到这一点,但我想知道是否有办法在蜂箱中做到这一点


有办法做到这一点吗?谢谢大家!

使用
lag
窗口功能

select person
      ,date
      ,time
      ,num_friends-lag(num_friends) over(partition by person order by time) as delta
      ,concat_ws(',',num_friends,lag(num_friends) over(partition by person order by time)) as found_diff_between
from tbl