在Postgresql中,如何找到在一个时间范围内仅发生第一个事件的3个连续事件?
我在下表中列出了用户id、时间戳和事件id。列“tag”表示这是理想的行(tag=1)还是不理想的行(tag=0): 因此,我想找到:在Postgresql中,如何找到在一个时间范围内仅发生第一个事件的3个连续事件?,sql,postgresql,Sql,Postgresql,我在下表中列出了用户id、时间戳和事件id。列“tag”表示这是理想的行(tag=1)还是不理想的行(tag=0): 因此,我想找到: 用户在同一日期内有3个正确(标记=1)连续事件(即三元组)的次数 这3个连续事件中每个事件的第一个事件的时间戳 理想情况下,返回的表应如下所示: user_id | first_occurrence |event_id | consecutive_events 46 | 2018-12-23 06:11:35.000 | 7
user_id | first_occurrence |event_id | consecutive_events
46 | 2018-12-23 06:11:35.000 | 7 | 2 <-- 2 consecutive triplets
46 | 2018-12-23 07:37:35.000 | 10 | 2 <-- this has 4 consecutive events but I am only interested in triplets of events.
122| 2018-12-23 06:11:35.000 | 4 | 1
122| 2018-12-28 06:38:35.000 | 2 | 1
我尝试使用densite\u RANK()
函数,但结果远远不是最优的:
dense_rank() over (partition by user_id, date(timestamp) order by tag,date(timestamp) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
[更新]
我在戈登回答的第一条评论中提到的例子如下。对于这些连续事件:
user_id | timestamp | event_id | tag
46 | 2018-12-23 06:11:35.000 | 7 | 1
46 | 2018-12-23 07:51:35.000 | 8 | 1
46 | 2018-12-23 07:26:35.000 | 9 | 1
46 | 2018-12-23 07:37:35.000 | 10 | 1
46 | 2018-12-23 08:05:35.000 | 11 | 1
46 | 2018-12-23 08:20:35.000 | 12 | 1
46 | 2018-12-23 09:10:35.000 | 13 | 1
查询返回:
user_id | min(timestamp) | min_event_id | num_consecutive
46 | 2018-12-23 06:11:35.000 | 7 | 2
它也应该回来
user_id | min(timestamp) | min_event_id | num_consecutive
46 | 2018-12-23 06:11:35.000 | 7 | 2
46 | 2018-12-23 07:37:35.000 | 10 | 2
您认为这也可以获取吗?这是一个缺口和孤岛问题。行数的差异似乎是最好的方法: 要获取所有相邻值,请执行以下操作:
select user_id, min(timestamp) as timestamp,
count(*) as num_consecutive,
min(event_id) as min_event_id
from (select t.*,
row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
from t
) t
group by user_id, timestamp::date, tag, (seqnum - seqnum_t);
我希望每个序列都是单独的,只需添加其中tag=1
和having count(*)>=3
这个查询
要将其转换为所需的结果集,请使用子查询:
select user_id, min(event_id), min(timestamp),
(sum(num_consecutive) / 3)
from (select user_id, min(timestamp) as timestamp,
count(*) as num_consecutive,
min(event_id) as min_event_id
from (select t.*,
row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
from t
) t
where tag = 1
group by user_id, timestamp::date, tag, (seqnum - seqnum_t)
) t
where num_consecutive >= 3
group by user_id, timestamp::date;
连续性是如何定义的?通过
事件id
或时间戳
?@GordonLinoff通过时间戳。非常感谢,戈登。这是我第一次遇到缺口和岛屿问题,我真的很喜欢你的回答。然而,它几乎解决了90%的问题。唯一的问题是min(timestamp)不返回第二个三元组的“第一”时间戳。请参阅我更新的问题,了解此案例的详细示例。再次非常感谢!谢谢你的更新!我已经尝试在第一个子查询中添加where tag=1并使count(*)>=3
,但它获取了错误的结果。除了一些三胞胎丢失的事实;此查询-而不是第二个三元组的第一个时间戳-它获取最后一个连续事件的时间戳(顺便说一句,它不是三元组)
select user_id, min(timestamp) as timestamp,
count(*) as num_consecutive,
min(event_id) as min_event_id
from (select t.*,
row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
from t
) t
group by user_id, timestamp::date, tag, (seqnum - seqnum_t);
select user_id, min(event_id), min(timestamp),
(sum(num_consecutive) / 3)
from (select user_id, min(timestamp) as timestamp,
count(*) as num_consecutive,
min(event_id) as min_event_id
from (select t.*,
row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
from t
) t
where tag = 1
group by user_id, timestamp::date, tag, (seqnum - seqnum_t)
) t
where num_consecutive >= 3
group by user_id, timestamp::date;