Sql 在时间戳上的流动窗口中查找罕见事件_Sql_Performance_Postgresql_Timestamp_Window Functions_Plpgsql

Sql 在时间戳上的流动窗口中查找罕见事件

sql performance postgresql

Sql 在时间戳上的流动窗口中查找罕见事件,sql,performance,postgresql,timestamp,window-functions,plpgsql,Sql,Performance,Postgresql,Timestamp,Window Functions,Plpgsql,鉴于下表： CREATE TABLE table ( "id" serial NOT NULL, "timestamp" timestamp without time zone NOT NULL, "count" integer NOT NULL DEFAULT 0 ) 我在寻找“罕见事件”。罕见事件是具有以下属性的行：简单：count=1 硬：10分钟时间跨度内的所有行（在当前行的时间戳之前和之后）都有count=0（当然，给定行除外）例如： id timestamp c

鉴于下表：

CREATE TABLE table
(
 "id" serial NOT NULL,
 "timestamp" timestamp without time zone NOT NULL,
 "count" integer NOT NULL DEFAULT 0
)

我在寻找“罕见事件”。罕见事件是具有以下属性的行：

简单：
```
count=1
```
硬：10分钟时间跨度内的所有行（在当前行的时间戳之前和之后）都有
```
count=0
```
（当然，给定行除外）

例如：

id   timestamp  count
0    08:00      0    
1    08:11      0    
2    08:15      2     <== not rare event (count!=1)   
3    08:19      0    
4    08:24      0    
5    08:25      0   
6    08:29      1     <== not rare event (see 8:35)
7    08:31      0    
8    08:35      1    
9    08:40      0    
10   08:46      1     <== rare event!  
10   08:48      0   
10   08:51      0   
10   08:55      0   
10   08:58      1     <== rare event!  
10   09:02      0   
10   09:09      1

id时间戳计数
0    08:00      0    
1    08:11      0    
2 08:15 2我认为这是一个很好的使用案例-此查询过滤count=1的所有记录，然后获取上一行和下一行，查看它是否接近10分钟：
with cte as (
  select
      "id", "timestamp", "count",
      lag("timestamp") over(w) + '10 minutes'::interval as "lag_timestamp",
      lead("timestamp") over(w) - '10 minutes'::interval as "lead_timestamp"
  from gm_inductionloopdata as curr
  where curr."count" <> 0
  window w as (order by "timestamp")
)
select "id", "timestamp"
from cte
where
    "count" = 1 and
    ("lag_timestamp" is null or "lag_timestamp" < "timestamp") and
    ("lead_timestamp" is null or "lead_timestamp" > "timestamp")


顺便说一句，请不要将您的列称为“count”
，称为“timestamp”
或其他关键字、函数名和类型名。
这可能会更快，但也会提高效率
如果你是在Postgres 9.2或更高版本的上，并且给出了一些先决条件，那就让它成为一个例子
创建索引tbl\u ravel\u覆盖tbl（ts、ct、id）上的idx
式中，ct为0；


列的顺序很重要<代码>ts

必须放在第一位，
ct
应该放在第二位。在
选择中需要的其他列如下所示
详细信息请阅读使用进行测试，以查看哪个查询更快，以及是否使用了索引您的查询是错误的，没有位置和中间位置被翻转（应该是“-”10分钟…”和“+”10分钟…）…此外，您是否尝试了时间戳列上的索引以查看当前查询是否工作正常？是否确实有七行具有相同的ID？“时间戳”真的是“时间”吗？（在SQL数据库中，时间戳通常意味着日期和时间，而不仅仅是时间，这在PostgreSQL中是一种完全不同的数据类型。）第一个示例很好地使用了超前和滞后，但并没有解决问题。正如问题所述，它不能有接近10分钟的行，但计数为0，这并不意味着这些行接近一行（如超前和滞后所假定）。@MatheusOl我的第一个查询过滤计数为1的所有行，然后检查最近的一行是否超过10分钟，所以它应该做一些小动作，但是，如果一行超过10分钟，并且距离两行或更多行也更远，那么它将不起作用，例如08:29（从问题的示例中），它的前导是08:31，但08:35也在10分钟之内，并且您没有考虑这一行。@MatheusOl您检查过SQL小提琴了吗？有两个cte，我从已经过滤的列表中得到滞后和领先。好的，我现在得到了“已经过滤”了。。。但如果附近的计数不完全是1，则仍然存在一个小问题。但这很容易解决：（注意我更改了08:35的行）。 with cte as ( select "id", "timestamp", "count", lag("timestamp") over(w) + '10 minutes'::interval as "lag_timestamp", lead("timestamp") over(w) - '10 minutes'::interval as "lead_timestamp" from gm_inductionloopdata as curr where curr."count" <> 0 window w as (order by "timestamp") ) select "id", "timestamp" from cte where "count" = 1 and ("lag_timestamp" is null or "lag_timestamp" < "timestamp") and ("lead_timestamp" is null or "lead_timestamp" > "timestamp") select * from gm_inductionloopdata as curr where curr."count" = 1 and not exists ( select * from gm_inductionloopdata as g where -- you can change this to between, I've used this just for readability g."timestamp" <= curr."timestamp" + '10 minutes'::interval and g."timestamp" >= curr."timestamp" - '10 minutes'::interval and g."id" <> curr."id" and g."count" = 1 ); SELECT id, ts, ct FROM ( SELECT id, ts, ct ,lag (ts, 1, '-infinity') OVER (ORDER BY ts) as prev_ts ,lead(ts, 1, 'infinity') OVER (ORDER BY ts) as next_ts FROM tbl WHERE ct <> 0 ) sub WHERE ct = 1 AND prev_ts < ts - interval '10 min' AND next_ts > ts + interval '10 min' ORDER BY ts; CREATE INDEX tbl_rare_idx ON tbl(ts) WHERE ct <> 0; CREATE INDEX tbl_rare_covering_idx ON tbl(ts, ct, id) WHERE ct <> 0;