Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql Postgres按连续项目分组,当间隔超过10分钟时打破_Sql_Postgresql - Fatal编程技术网

Sql Postgres按连续项目分组,当间隔超过10分钟时打破

Sql Postgres按连续项目分组,当间隔超过10分钟时打破,sql,postgresql,Sql,Postgresql,假设我有一张桌子,上面有: id | type | timemstamp 1 | 'dog' | '2019-01-01T12:00:00Z' 2 | 'cat' | '2019-01-01T12:01:00Z' 3 | 'dog' | '2019-01-01T12:02:00Z' 4 | 'dog' | '2019-01-01T12:03:00Z' 5 | 'cat' | '2019-01-01T12:03:00Z' 6 | 'dog' | '2019-01-01T12:15:00

假设我有一张桌子,上面有:

id | type | timemstamp
1  | 'dog' | '2019-01-01T12:00:00Z'
2  | 'cat' | '2019-01-01T12:01:00Z'
3  | 'dog' | '2019-01-01T12:02:00Z'
4  | 'dog' | '2019-01-01T12:03:00Z'
5  | 'cat' | '2019-01-01T12:03:00Z'
6  | 'dog' | '2019-01-01T12:15:00Z'
我想看看:

starttime | endtime | count | type
'2019-01-01T12:00:00Z', '2019-01-01T12:03:00Z', 3, 'dog'
'2019-01-01T12:01:00Z', '2019-01-01T12:03:00Z', 2, 'cat'
'2019-01-01T12:14:00Z', '2019-01-01T12:14:00Z', 1, 'dog'
编辑:

为了澄清,我基本上是在集群中按活动分组,集群被定义为10分钟内的连续活动

因此,在上面的示例中,第一个狗群的计数为3,因为它在10分钟内有3行,而每行之间的间隔不超过10分钟。

使用lag获取上一个时间戳。然后是前一个间隔超过10分钟的累积计数。最后,聚合:

select min(timestamp), max(timestamp), count(*), type
from (select t.*,
             count(*) filter (where prev_ts < timestamp - interval '10 minute') over (partition by type order by timestamp) as grp
      from (select t.*,
                   lag(timestamp) over (partition by type order by timestamp) as prev_ts
            from t
           ) t
     ) t
group by type, grp
order by type, min(timestamp)
这并不能完全产生你在问题中得到的结果,但它可能是你真正想要的

这是一个挑战:

with recursive c as (
    (
        select
            type,
            min(timestamp) as timestamp,
            '0'::interval as dt,
            1 as cl
        from t group by type) union all
    (
        select distinct on (t.type)
            t.type,
            t.timestamp,
            case when dt + (t.timestamp - c.timestamp) > '10 min'::interval then '0'::interval else dt + (t.timestamp - c.timestamp) end,
            case when dt + (t.timestamp - c.timestamp) > '10 min'::interval then cl + 1 else cl end
        from t join c on (t.type = c.type and t.timestamp > c.timestamp)
        order by t.type, t.timestamp))
select
    min(timestamp) as starttime,
    max(timestamp) as endtyime,
    count(*) as "count",
    "type"
from c
group by "type", cl
order by 1;

┌─────────────────────┬─────────────────────┬───────┬──────┐
│      starttime      │      endtyime       │ count │ type │
├─────────────────────┼─────────────────────┼───────┼──────┤
│ 2019-01-01 12:00:00 │ 2019-01-01 12:03:00 │     3 │ dog  │
│ 2019-01-01 12:01:00 │ 2019-01-01 12:03:00 │     2 │ cat  │
│ 2019-01-01 12:15:00 │ 2019-01-01 12:15:00 │     1 │ dog  │
└─────────────────────┴─────────────────────┴───────┴──────┘
希望您熟悉递归CTE

简要说明:

dt列保留从最后一个检查点到当前时间的间隔。如果大于10分钟,则重置为0

cl列保留集群编号。当dt大于10分钟时,它会增加

最后,我们将查找最小和最大时间戳作为开始和结束时间,以及每个类型和集群的行数


合并行的规则是什么。这并不明显,也不清楚:你想在10分钟内拍摄系列片还是每场比赛不超过10分钟的系列片