在Postgresql中，如何找到在一个时间范围内仅发生第一个事件的3个连续事件？_Sql_Postgresql

在Postgresql中，如何找到在一个时间范围内仅发生第一个事件的3个连续事件？

sql postgresql

在Postgresql中，如何找到在一个时间范围内仅发生第一个事件的3个连续事件？,sql,postgresql,Sql,Postgresql,我在下表中列出了用户id、时间戳和事件id。列“tag”表示这是理想的行（tag=1）还是不理想的行（tag=0）：因此，我想找到：用户在同一日期内有3个正确（标记=1）连续事件（即三元组）的次数这3个连续事件中每个事件的第一个事件的时间戳理想情况下，返回的表应如下所示： user_id | first_occurrence |event_id | consecutive_events 46 | 2018-12-23 06:11:35.000 | 7

我在下表中列出了用户id、时间戳和事件id。列“tag”表示这是理想的行（tag=1）还是不理想的行（tag=0）：

因此，我想找到：

用户在同一日期内有3个正确（标记=1）连续事件（即三元组）的次数

这3个连续事件中每个事件的第一个事件的时间戳

理想情况下，返回的表应如下所示：

user_id | first_occurrence           |event_id | consecutive_events 
     46 | 2018-12-23 06:11:35.000    | 7       | 2  <-- 2 consecutive triplets 
     46 | 2018-12-23 07:37:35.000    | 10      | 2  <-- this has 4 consecutive events  but I am only interested in triplets of events.
     122| 2018-12-23 06:11:35.000    | 4       | 1
     122| 2018-12-28 06:38:35.000    | 2       | 1

我尝试使用

densite\u RANK（）

函数，但结果远远不是最优的：

dense_rank() over (partition by user_id, date(timestamp) order by tag,date(timestamp) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

[更新]

我在戈登回答的第一条评论中提到的例子如下。对于这些连续事件：

user_id | timestamp                 | event_id | tag 
    46  | 2018-12-23 06:11:35.000   | 7        | 1
    46  | 2018-12-23 07:51:35.000   | 8        | 1
    46  | 2018-12-23 07:26:35.000   | 9        | 1
    46  | 2018-12-23 07:37:35.000   | 10       | 1
    46  | 2018-12-23 08:05:35.000   | 11       | 1
    46  | 2018-12-23 08:20:35.000   | 12       | 1 
    46  | 2018-12-23 09:10:35.000   | 13       | 1

查询返回：

 user_id | min(timestamp)            | min_event_id | num_consecutive 
     46  | 2018-12-23 06:11:35.000   | 7            | 2

它也应该回来

user_id | min(timestamp)            | min_event_id | num_consecutive 
     46  | 2018-12-23 06:11:35.000   | 7            | 2
     46  | 2018-12-23 07:37:35.000   | 10           | 2

您认为这也可以获取吗？

这是一个缺口和孤岛问题。行数的差异似乎是最好的方法：

要获取所有相邻值，请执行以下操作：

select user_id, min(timestamp) as timestamp,
       count(*) as num_consecutive,
       min(event_id) as min_event_id
from (select t.*,
             row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
             row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
      from t
     ) t
group by user_id, timestamp::date, tag, (seqnum - seqnum_t);

我希望每个序列都是单独的，只需添加

其中tag=1

和

having count（*）>=3

这个查询

要将其转换为所需的结果集，请使用子查询：

select user_id, min(event_id), min(timestamp),
       (sum(num_consecutive) / 3)
from (select user_id, min(timestamp) as timestamp,
             count(*) as num_consecutive,
             min(event_id) as min_event_id
      from (select t.*,
                   row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
                   row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
            from t
           ) t
      where tag = 1
      group by user_id, timestamp::date, tag, (seqnum - seqnum_t)
     ) t
where num_consecutive >= 3
group by user_id, timestamp::date;

连续性是如何定义的？通过

事件id

或

时间戳

？@GordonLinoff通过时间戳。非常感谢，戈登。这是我第一次遇到缺口和岛屿问题，我真的很喜欢你的回答。然而，它几乎解决了90%的问题。唯一的问题是min（timestamp）不返回第二个三元组的“第一”时间戳。请参阅我更新的问题，了解此案例的详细示例。再次非常感谢！谢谢你的更新！我已经尝试在第一个子查询中添加

where tag=1并使count（*）>=3

，但它获取了错误的结果。除了一些三胞胎丢失的事实；此查询-而不是第二个三元组的第一个时间戳-它获取最后一个连续事件的时间戳（顺便说一句，它不是三元组）

select user_id, min(timestamp) as timestamp,
       count(*) as num_consecutive,
       min(event_id) as min_event_id
from (select t.*,
             row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
             row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
      from t
     ) t
group by user_id, timestamp::date, tag, (seqnum - seqnum_t);

select user_id, min(event_id), min(timestamp),
       (sum(num_consecutive) / 3)
from (select user_id, min(timestamp) as timestamp,
             count(*) as num_consecutive,
             min(event_id) as min_event_id
      from (select t.*,
                   row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
                   row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
            from t
           ) t
      where tag = 1
      group by user_id, timestamp::date, tag, (seqnum - seqnum_t)
     ) t
where num_consecutive >= 3
group by user_id, timestamp::date;