Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/68.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/postgresql/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Postgresql中,如何找到在一个时间范围内仅发生第一个事件的3个连续事件?_Sql_Postgresql - Fatal编程技术网

在Postgresql中,如何找到在一个时间范围内仅发生第一个事件的3个连续事件?

在Postgresql中,如何找到在一个时间范围内仅发生第一个事件的3个连续事件?,sql,postgresql,Sql,Postgresql,我在下表中列出了用户id、时间戳和事件id。列“tag”表示这是理想的行(tag=1)还是不理想的行(tag=0): 因此,我想找到: 用户在同一日期内有3个正确(标记=1)连续事件(即三元组)的次数 这3个连续事件中每个事件的第一个事件的时间戳 理想情况下,返回的表应如下所示: user_id | first_occurrence |event_id | consecutive_events 46 | 2018-12-23 06:11:35.000 | 7

我在下表中列出了用户id、时间戳和事件id。列“tag”表示这是理想的行(tag=1)还是不理想的行(tag=0):

因此,我想找到:

  • 用户在同一日期内有3个正确(标记=1)连续事件(即三元组)的次数
  • 这3个连续事件中每个事件的第一个事件的时间戳
  • 理想情况下,返回的表应如下所示:

    user_id | first_occurrence           |event_id | consecutive_events 
         46 | 2018-12-23 06:11:35.000    | 7       | 2  <-- 2 consecutive triplets 
         46 | 2018-12-23 07:37:35.000    | 10      | 2  <-- this has 4 consecutive events  but I am only interested in triplets of events.
         122| 2018-12-23 06:11:35.000    | 4       | 1
         122| 2018-12-28 06:38:35.000    | 2       | 1  
    
    我尝试使用
    densite\u RANK()
    函数,但结果远远不是最优的:

    dense_rank() over (partition by user_id, date(timestamp) order by tag,date(timestamp) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    
    [更新]

    我在戈登回答的第一条评论中提到的例子如下。对于这些连续事件:

    user_id | timestamp                 | event_id | tag 
        46  | 2018-12-23 06:11:35.000   | 7        | 1
        46  | 2018-12-23 07:51:35.000   | 8        | 1
        46  | 2018-12-23 07:26:35.000   | 9        | 1
        46  | 2018-12-23 07:37:35.000   | 10       | 1
        46  | 2018-12-23 08:05:35.000   | 11       | 1
        46  | 2018-12-23 08:20:35.000   | 12       | 1 
        46  | 2018-12-23 09:10:35.000   | 13       | 1
    
    查询返回:

     user_id | min(timestamp)            | min_event_id | num_consecutive 
         46  | 2018-12-23 06:11:35.000   | 7            | 2
    
    它也应该回来

    user_id | min(timestamp)            | min_event_id | num_consecutive 
         46  | 2018-12-23 06:11:35.000   | 7            | 2
         46  | 2018-12-23 07:37:35.000   | 10           | 2
    

    您认为这也可以获取吗?

    这是一个缺口和孤岛问题。行数的差异似乎是最好的方法:

    要获取所有相邻值,请执行以下操作:

    select user_id, min(timestamp) as timestamp,
           count(*) as num_consecutive,
           min(event_id) as min_event_id
    from (select t.*,
                 row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
                 row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
          from t
         ) t
    group by user_id, timestamp::date, tag, (seqnum - seqnum_t);
    
    我希望每个序列都是单独的,只需添加
    其中tag=1
    having count(*)>=3
    这个查询

    要将其转换为所需的结果集,请使用子查询:

    select user_id, min(event_id), min(timestamp),
           (sum(num_consecutive) / 3)
    from (select user_id, min(timestamp) as timestamp,
                 count(*) as num_consecutive,
                 min(event_id) as min_event_id
          from (select t.*,
                       row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
                       row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
                from t
               ) t
          where tag = 1
          group by user_id, timestamp::date, tag, (seqnum - seqnum_t)
         ) t
    where num_consecutive >= 3
    group by user_id, timestamp::date;
    

    连续性是如何定义的?通过
    事件id
    时间戳
    ?@GordonLinoff通过时间戳。非常感谢,戈登。这是我第一次遇到缺口和岛屿问题,我真的很喜欢你的回答。然而,它几乎解决了90%的问题。唯一的问题是min(timestamp)不返回第二个三元组的“第一”时间戳。请参阅我更新的问题,了解此案例的详细示例。再次非常感谢!谢谢你的更新!我已经尝试在第一个子查询中添加
    where tag=1并使count(*)>=3
    ,但它获取了错误的结果。除了一些三胞胎丢失的事实;此查询-而不是第二个三元组的第一个时间戳-它获取最后一个连续事件的时间戳(顺便说一句,它不是三元组)
    select user_id, min(timestamp) as timestamp,
           count(*) as num_consecutive,
           min(event_id) as min_event_id
    from (select t.*,
                 row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
                 row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
          from t
         ) t
    group by user_id, timestamp::date, tag, (seqnum - seqnum_t);
    
    select user_id, min(event_id), min(timestamp),
           (sum(num_consecutive) / 3)
    from (select user_id, min(timestamp) as timestamp,
                 count(*) as num_consecutive,
                 min(event_id) as min_event_id
          from (select t.*,
                       row_number() over (partition by user_id, timestamp::date order by timestamp) as seqnum,
                       row_number() over (partition by user_id, timestamp::date, tag order by timestamp) as seqnum_t
                from t
               ) t
          where tag = 1
          group by user_id, timestamp::date, tag, (seqnum - seqnum_t)
         ) t
    where num_consecutive >= 3
    group by user_id, timestamp::date;