Sql 在巨大的事件流中寻找差距？_Sql_Mongodb_Algorithm_Postgresql_Bigdata

Sql 在巨大的事件流中寻找差距？

sql mongodb algorithm postgresql

Sql 在巨大的事件流中寻找差距？,sql,mongodb,algorithm,postgresql,bigdata,Sql,Mongodb,Algorithm,Postgresql,Bigdata,我在PostgreSQL数据库中有大约100万个以下格式的事件： id | stream_id | timestamp ----------+-----------------+----------------- 1 | 7 | .... 2 | 8 | .... 大约有50000条独特的溪流我需要找到任意两个事件之间的时间在某个时间段内的所有事件。换句话说，我需要找到在某

我在PostgreSQL数据库中有大约100万个以下格式的事件：

id        |   stream_id     |  timestamp
----------+-----------------+-----------------
1         |   7             |  ....
2         |   8             |  ....

大约有50000条独特的溪流

我需要找到任意两个事件之间的时间在某个时间段内的所有事件。换句话说，我需要找到在某个时间段内没有事件的事件对

例如：

a b c d   e     f              g         h   i  j k
| | | |   |     |              |         |   |  | | 

                \____2 mins____/

在这个场景中，我想找到一对（f，g），因为这些是紧靠着一个间隙的事件

我不在乎查询是否（那么）慢，也就是说，在100万条记录上，如果需要一个小时左右就可以了。然而，数据集将继续增长，因此，如果数据集增长缓慢，则有望正常扩展

我也有MongoDB中的数据

执行此查询的最佳方式是什么？

在postgres中，借助lag（）窗口函数可以非常轻松地完成此查询。以下面的小提琴为例：

PostgreSQL 9.3架构设置：

CREATE TABLE Table1
    ("id" int, "stream_id" int, "timestamp" timestamp)
;

INSERT INTO Table1
    ("id", "stream_id", "timestamp")
VALUES
    (1, 7, '2015-06-01 15:20:30'),
    (2, 7, '2015-06-01 15:20:31'),
    (3, 7, '2015-06-01 15:20:32'),
    (4, 7, '2015-06-01 15:25:30'),
    (5, 7, '2015-06-01 15:25:31')
;

with c as (select *,
           lag("timestamp") over(partition by stream_id order by id) as pre_time,
           lag(id) over(partition by stream_id order by id) as pre_id
           from Table1
          )
select * from c where "timestamp" - pre_time > interval '2 sec'

| id | stream_id |              timestamp |               pre_time | pre_id |
|----|-----------|------------------------|------------------------|--------|
|  4 |         7 | June, 01 2015 15:25:30 | June, 01 2015 15:20:32 |      3 |

查询1：

CREATE TABLE Table1
    ("id" int, "stream_id" int, "timestamp" timestamp)
;

INSERT INTO Table1
    ("id", "stream_id", "timestamp")
VALUES
    (1, 7, '2015-06-01 15:20:30'),
    (2, 7, '2015-06-01 15:20:31'),
    (3, 7, '2015-06-01 15:20:32'),
    (4, 7, '2015-06-01 15:25:30'),
    (5, 7, '2015-06-01 15:25:31')
;

with c as (select *,
           lag("timestamp") over(partition by stream_id order by id) as pre_time,
           lag(id) over(partition by stream_id order by id) as pre_id
           from Table1
          )
select * from c where "timestamp" - pre_time > interval '2 sec'

| id | stream_id |              timestamp |               pre_time | pre_id |
|----|-----------|------------------------|------------------------|--------|
|  4 |         7 | June, 01 2015 15:25:30 | June, 01 2015 15:20:32 |      3 |

：

CREATE TABLE Table1
    ("id" int, "stream_id" int, "timestamp" timestamp)
;

INSERT INTO Table1
    ("id", "stream_id", "timestamp")
VALUES
    (1, 7, '2015-06-01 15:20:30'),
    (2, 7, '2015-06-01 15:20:31'),
    (3, 7, '2015-06-01 15:20:32'),
    (4, 7, '2015-06-01 15:25:30'),
    (5, 7, '2015-06-01 15:25:31')
;

with c as (select *,
           lag("timestamp") over(partition by stream_id order by id) as pre_time,
           lag(id) over(partition by stream_id order by id) as pre_id
           from Table1
          )
select * from c where "timestamp" - pre_time > interval '2 sec'

| id | stream_id |              timestamp |               pre_time | pre_id |
|----|-----------|------------------------|------------------------|--------|
|  4 |         7 | June, 01 2015 15:25:30 | June, 01 2015 15:20:32 |      3 |

您可以通过按时间戳排序的stream_id在分区上使用window函数来实现这一点。

lag（）

函数允许您访问分区中的前几行；如果没有滞后值，则为上一行。因此，如果stream_id上的分区是按时间排序的，那么前一行就是该stream_id的前一个事件

SELECT stream_id, lag(id) OVER pair AS start_id, id AS end_id,
       ("timestamp" - lag("timestamp") OVER pair) AS diff
FROM my_table
WHERE diff > interval '2 minutes'
WINDOW pair AS (PARTITION BY stream_id ORDER BY "timestamp");

事件是否由单个时间点（相对于时间跨度）定义？您希望每个流都定义它吗？换句话说，差距在于属于单个流的事件之间？@MOehm是的，单个时间点您是否也需要MongoDB的解决方案？能否向我们展示MongoDB中的示例文档和预期输出？