Sql Postgres按连续项目分组，当间隔超过10分钟时打破_Sql_Postgresql

Sql Postgres按连续项目分组，当间隔超过10分钟时打破

sql postgresql

Sql Postgres按连续项目分组，当间隔超过10分钟时打破,sql,postgresql,Sql,Postgresql,假设我有一张桌子，上面有： id | type | timemstamp 1 | 'dog' | '2019-01-01T12:00:00Z' 2 | 'cat' | '2019-01-01T12:01:00Z' 3 | 'dog' | '2019-01-01T12:02:00Z' 4 | 'dog' | '2019-01-01T12:03:00Z' 5 | 'cat' | '2019-01-01T12:03:00Z' 6 | 'dog' | '2019-01-01T12:15:00

假设我有一张桌子，上面有：

id | type | timemstamp
1  | 'dog' | '2019-01-01T12:00:00Z'
2  | 'cat' | '2019-01-01T12:01:00Z'
3  | 'dog' | '2019-01-01T12:02:00Z'
4  | 'dog' | '2019-01-01T12:03:00Z'
5  | 'cat' | '2019-01-01T12:03:00Z'
6  | 'dog' | '2019-01-01T12:15:00Z'

我想看看：

starttime | endtime | count | type
'2019-01-01T12:00:00Z', '2019-01-01T12:03:00Z', 3, 'dog'
'2019-01-01T12:01:00Z', '2019-01-01T12:03:00Z', 2, 'cat'
'2019-01-01T12:14:00Z', '2019-01-01T12:14:00Z', 1, 'dog'

编辑：

为了澄清，我基本上是在集群中按活动分组，集群被定义为10分钟内的连续活动

因此，在上面的示例中，第一个狗群的计数为3，因为它在10分钟内有3行，而每行之间的间隔不超过10分钟。

使用lag获取上一个时间戳。然后是前一个间隔超过10分钟的累积计数。最后，聚合：

select min(timestamp), max(timestamp), count(*), type
from (select t.*,
             count(*) filter (where prev_ts < timestamp - interval '10 minute') over (partition by type order by timestamp) as grp
      from (select t.*,
                   lag(timestamp) over (partition by type order by timestamp) as prev_ts
            from t
           ) t
     ) t
group by type, grp
order by type, min(timestamp)

这并不能完全产生你在问题中得到的结果，但它可能是你真正想要的

这是一个挑战：

with recursive c as (
    (
        select
            type,
            min(timestamp) as timestamp,
            '0'::interval as dt,
            1 as cl
        from t group by type) union all
    (
        select distinct on (t.type)
            t.type,
            t.timestamp,
            case when dt + (t.timestamp - c.timestamp) > '10 min'::interval then '0'::interval else dt + (t.timestamp - c.timestamp) end,
            case when dt + (t.timestamp - c.timestamp) > '10 min'::interval then cl + 1 else cl end
        from t join c on (t.type = c.type and t.timestamp > c.timestamp)
        order by t.type, t.timestamp))
select
    min(timestamp) as starttime,
    max(timestamp) as endtyime,
    count(*) as "count",
    "type"
from c
group by "type", cl
order by 1;

┌─────────────────────┬─────────────────────┬───────┬──────┐
│      starttime      │      endtyime       │ count │ type │
├─────────────────────┼─────────────────────┼───────┼──────┤
│ 2019-01-01 12:00:00 │ 2019-01-01 12:03:00 │     3 │ dog  │
│ 2019-01-01 12:01:00 │ 2019-01-01 12:03:00 │     2 │ cat  │
│ 2019-01-01 12:15:00 │ 2019-01-01 12:15:00 │     1 │ dog  │
└─────────────────────┴─────────────────────┴───────┴──────┘

希望您熟悉递归CTE

简要说明：

dt列保留从最后一个检查点到当前时间的间隔。如果大于10分钟，则重置为0

cl列保留集群编号。当dt大于10分钟时，它会增加

最后，我们将查找最小和最大时间戳作为开始和结束时间，以及每个类型和集群的行数

合并行的规则是什么。这并不明显，也不清楚：你想在10分钟内拍摄系列片还是每场比赛不超过10分钟的系列片