Sql server 选择在一个时间间隔内至少登录三次的所有客户

Sql server 选择在一个时间间隔内至少登录三次的所有客户,sql-server,tsql,datetime,sql-server-2014,Sql Server,Tsql,Datetime,Sql Server 2014,我有一个登录表,其中包含客户ID和登录时间customerid的时间戳,timestamp 我希望在60分钟内获得至少登录三次的所有客户ID。顺便说一下,登录表很大。自动加入不是一种选择 例如: 客户id |时间戳 1 | 2016-08-16 00:00 2 | 2016-08-16 00:00 3 | 2016-08-16 00:00 1 | 2016-08-16 00:25 2 | 2016

我有一个登录表,其中包含客户ID和登录时间customerid的时间戳,timestamp

我希望在60分钟内获得至少登录三次的所有客户ID。顺便说一下,登录表很大。自动加入不是一种选择

例如:

客户id |时间戳 1 | 2016-08-16 00:00 2 | 2016-08-16 00:00 3 | 2016-08-16 00:00 1 | 2016-08-16 00:25 2 | 2016-08-16 01:25 3 | 2016-08-16 00:25 1 | 2016-08-16 00:47 2 | 2016-08-16 01:27 3 | 2016-08-16 02:25 3 | 2016-08-16 03:25 1 | 2016-08-16 01:05 对于本例,查询应仅返回customerid 1。有什么想法吗?

使用自连接

    SELECT M.customer_id FROM (
    SELECT Distinct   T1.customer_id, T1.Time,
    T2.Time,
    Datediff(minute,T1.Time,T2.Time) as diff
    FROM Table T1 JOIN Table T2 ON T1.customer_id=T2.customer_id
    AND T1.Time<T2.Time
    ) M
    WHERE diff<=60
    Group By M.customer_id
    Having count(M.*)>=3
用rexTester测试:谢谢


此查询获取从现在起60分钟内至少登录三次的CustomerID:

SELECT customerid FROM 
(SELECT customerid, count(*) as loginnumber FROM LoginTable
 GROUP BY customerid
 WHERE [timestamp] > DATEADD(minute, -60, GetDate()) ) LT
WHERE loginnumber >= 3
希望有帮助:


我将Pqsql解决方案改编为mssql: 你可以在这里看到结果

我只能在mssql上进行测试,并且对于mssql来说,以下内容似乎有效:希望您的mssql版本也支持分析功能

在这种情况下,不需要自连接,只扫描一次表

 CREATE TABLE loginTable (id INT NOT NULL, timestamp DATETIME NOT NULL)

 INSERT INTO loginTable (id, timestamp)
       SELECT 1, '2016-08-16 00:00'
 UNION SELECT 2, '2016-08-16 00:00'
 UNION SELECT 3, '2016-08-16 00:00'
 UNION SELECT 1, '2016-08-16 00:25'
 UNION SELECT 2, '2016-08-16 01:25'
 UNION SELECT 3, '2016-08-16 00:25'
 UNION SELECT 1, '2016-08-16 00:47'
 UNION SELECT 2, '2016-08-16 01:27'
 UNION SELECT 3, '2016-08-16 02:25'
 UNION SELECT 3, '2016-08-16 03:25'
 UNION SELECT 1, '2016-08-16 01:05';

 select id,  min_t, max_t from (
 select id,
         min(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as min_t, 
         max(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as max_t,
         count(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as num_t
   from loginTable
 ) ts_data    
  where ABS(DATEDIFF(minute,min_t,max_t)) <= 60 and num_t=3;
感谢@Salvador分享了一些测试脚本

解释 这里的想法是按时间戳只扫描登录表一次,并将最近三次出现的每个id都保存在内存中。 如果这三个时间戳中的最小时间戳和最大时间戳发生在60分钟内,我们几乎得到了结果

最后,我们必须处理一个极端情况: 当我们遇到客户的第一次或第二次登录时,我们可以在60分钟内获得最小和最大时间戳,在第一次登录的情况下,它们是相同的

然而,他们不会满足OP要求,他提到了3个不同的登录,所以我们必须计算登录的数量,并确保它们是3 num\u t=3

编辑 再次感谢萨尔瓦多的警告


在第一个版本中有一个错误,在windows规范中,我说的是前3行之间的行。事实上,我必须看3行,但是当前的一行被包括在内,所以我应该在前面的2行之间设置行

找到不跨越小时界限的问题的简单方法是:

select
  id,
  datepart(yy,timestamp) as yy,
  datepart(mm,timestamp) as mm,
  datepart(dd,timestamp) as dd,
  datepart(hh,timestamp) as hh,
  count(*)
from
  logintable
group by
  id,
  datepart(yy,timestamp),
  datepart(mm,timestamp),
  datepart(dd,timestamp),
  datepart(hh,timestamp)
having
  count(*) >= 3

如果您的表非常大,您可能会将其下载到每天至少三次登录的客户机上,然后自行加入。它仍然会错过一天中的登录,但这是一个简单的解决方案,可以在您使用更复杂的解决方案时推动您前进。

您在SQL 2014 web Edition上使用的是哪种RDM/版本?这不会返回一个在三天内每天访问该站点两次的客户吗?看起来像评论中提到的SQL Server。是的,我知道。所以我假设他可以很容易地在MS SQL中复制相同的行为?实际上MySQL也不擅长这一点,为此我最终迁移到了Postgresql。他们的窗口功能使这种相关分析变得更容易。SQL Server的SQL方言与MySQL有很大的不同,因此不容易移植。基于Postgresql窗口的解决方案可能更容易迁移。我从未见过这样的查询,我喜欢它!但是,尝试使用此集合并不返回任何结果:选择1,'2016-08-16 00:00'联合选择2,'2016-08-16 00:00'联合选择3,'2016-08-16 00:00'联合选择1,'2016-08-16 01:25'联合选择2,'2016-08-16 01:25'联合选择3,'2016-08-16 01:47'联合选择2,'2016-08-16 01:27'联合选择3,“2016-08-16 02:25”联合选择3,“2016-08-16 03:25”联合选择1,“2016-08-16 02:05”联合选择1,“2016-08-16 02:20”感谢您的提醒!我在窗口规范中犯了一个错误。我已编辑查询以更正此问题。
CREATE TABLE loginTable (id INT NOT NULL, timestamp DATETIME NOT NULL)

INSERT INTO loginTable (id, timestamp)
      SELECT 1, '2016-08-16 00:00'
UNION SELECT 2, '2016-08-16 00:00'
UNION SELECT 3, '2016-08-16 00:00'
UNION SELECT 1, '2016-08-16 00:25'
UNION SELECT 2, '2016-08-16 01:25'
UNION SELECT 3, '2016-08-16 00:25'
UNION SELECT 1, '2016-08-16 00:47'
UNION SELECT 2, '2016-08-16 01:27'
UNION SELECT 3, '2016-08-16 02:25'
UNION SELECT 3, '2016-08-16 03:25'
UNION SELECT 1, '2016-08-16 01:05'
WITH tbl AS (
  SELECT id 
    , IIF( DATEDIFF(minute, 
                    lag(ts, 1) OVER (PARTITION BY id ORDER BY ts asc ), 
                    ts )<=60, 
           1, 0) as freq60
  FROM loginTable
)
SELECT id FROM tbl
  GROUP BY tbl.id HAVING SUM(freq60) >=3
  ORDER BY tbl.id
with tbl as (
  select cust_id 
    ,case when extract(epoch from (ts - lag(ts, 1) over w) ) < 3600 then 1 
          else 0
     end as freq60
  from loginTable
  window w as (partition by id order by ts asc ) 
)
select cust_id 
from tbl
  group by tbl.cust_id having sum(freq60) >=3
  order by tbl.cust_id
id  freq60  prev_ts ts  intval
1   0   null    2016-08-16 00:00:00 
1   1   2016-08-16 00:00:00 2016-08-16 00:25:00 25
1   1   2016-08-16 00:25:00 2016-08-16 00:47:00 22
1   1   2016-08-16 00:47:00 2016-08-16 01:05:00 18
 CREATE TABLE loginTable (id INT NOT NULL, timestamp DATETIME NOT NULL)

 INSERT INTO loginTable (id, timestamp)
       SELECT 1, '2016-08-16 00:00'
 UNION SELECT 2, '2016-08-16 00:00'
 UNION SELECT 3, '2016-08-16 00:00'
 UNION SELECT 1, '2016-08-16 00:25'
 UNION SELECT 2, '2016-08-16 01:25'
 UNION SELECT 3, '2016-08-16 00:25'
 UNION SELECT 1, '2016-08-16 00:47'
 UNION SELECT 2, '2016-08-16 01:27'
 UNION SELECT 3, '2016-08-16 02:25'
 UNION SELECT 3, '2016-08-16 03:25'
 UNION SELECT 1, '2016-08-16 01:05';

 select id,  min_t, max_t from (
 select id,
         min(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as min_t, 
         max(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as max_t,
         count(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as num_t
   from loginTable
 ) ts_data    
  where ABS(DATEDIFF(minute,min_t,max_t)) <= 60 and num_t=3;
select
  id,
  datepart(yy,timestamp) as yy,
  datepart(mm,timestamp) as mm,
  datepart(dd,timestamp) as dd,
  datepart(hh,timestamp) as hh,
  count(*)
from
  logintable
group by
  id,
  datepart(yy,timestamp),
  datepart(mm,timestamp),
  datepart(dd,timestamp),
  datepart(hh,timestamp)
having
  count(*) >= 3