Sql server 选择在一个时间间隔内至少登录三次的所有客户
我有一个登录表,其中包含客户ID和登录时间customerid的时间戳,timestamp 我希望在60分钟内获得至少登录三次的所有客户ID。顺便说一下,登录表很大。自动加入不是一种选择 例如: 客户id |时间戳 1 | 2016-08-16 00:00 2 | 2016-08-16 00:00 3 | 2016-08-16 00:00 1 | 2016-08-16 00:25 2 | 2016-08-16 01:25 3 | 2016-08-16 00:25 1 | 2016-08-16 00:47 2 | 2016-08-16 01:27 3 | 2016-08-16 02:25 3 | 2016-08-16 03:25 1 | 2016-08-16 01:05 对于本例,查询应仅返回customerid 1。有什么想法吗?使用自连接Sql server 选择在一个时间间隔内至少登录三次的所有客户,sql-server,tsql,datetime,sql-server-2014,Sql Server,Tsql,Datetime,Sql Server 2014,我有一个登录表,其中包含客户ID和登录时间customerid的时间戳,timestamp 我希望在60分钟内获得至少登录三次的所有客户ID。顺便说一下,登录表很大。自动加入不是一种选择 例如: 客户id |时间戳 1 | 2016-08-16 00:00 2 | 2016-08-16 00:00 3 | 2016-08-16 00:00 1 | 2016-08-16 00:25 2 | 2016
SELECT M.customer_id FROM (
SELECT Distinct T1.customer_id, T1.Time,
T2.Time,
Datediff(minute,T1.Time,T2.Time) as diff
FROM Table T1 JOIN Table T2 ON T1.customer_id=T2.customer_id
AND T1.Time<T2.Time
) M
WHERE diff<=60
Group By M.customer_id
Having count(M.*)>=3
用rexTester测试:谢谢
此查询获取从现在起60分钟内至少登录三次的CustomerID:
SELECT customerid FROM
(SELECT customerid, count(*) as loginnumber FROM LoginTable
GROUP BY customerid
WHERE [timestamp] > DATEADD(minute, -60, GetDate()) ) LT
WHERE loginnumber >= 3
希望有帮助:
我将Pqsql解决方案改编为mssql: 你可以在这里看到结果 我只能在mssql上进行测试,并且对于mssql来说,以下内容似乎有效:希望您的mssql版本也支持分析功能 在这种情况下,不需要自连接,只扫描一次表
CREATE TABLE loginTable (id INT NOT NULL, timestamp DATETIME NOT NULL)
INSERT INTO loginTable (id, timestamp)
SELECT 1, '2016-08-16 00:00'
UNION SELECT 2, '2016-08-16 00:00'
UNION SELECT 3, '2016-08-16 00:00'
UNION SELECT 1, '2016-08-16 00:25'
UNION SELECT 2, '2016-08-16 01:25'
UNION SELECT 3, '2016-08-16 00:25'
UNION SELECT 1, '2016-08-16 00:47'
UNION SELECT 2, '2016-08-16 01:27'
UNION SELECT 3, '2016-08-16 02:25'
UNION SELECT 3, '2016-08-16 03:25'
UNION SELECT 1, '2016-08-16 01:05';
select id, min_t, max_t from (
select id,
min(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as min_t,
max(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as max_t,
count(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as num_t
from loginTable
) ts_data
where ABS(DATEDIFF(minute,min_t,max_t)) <= 60 and num_t=3;
感谢@Salvador分享了一些测试脚本
解释
这里的想法是按时间戳只扫描登录表一次,并将最近三次出现的每个id都保存在内存中。
如果这三个时间戳中的最小时间戳和最大时间戳发生在60分钟内,我们几乎得到了结果
最后,我们必须处理一个极端情况:
当我们遇到客户的第一次或第二次登录时,我们可以在60分钟内获得最小和最大时间戳,在第一次登录的情况下,它们是相同的
然而,他们不会满足OP要求,他提到了3个不同的登录,所以我们必须计算登录的数量,并确保它们是3 num\u t=3
编辑
再次感谢萨尔瓦多的警告
在第一个版本中有一个错误,在windows规范中,我说的是前3行之间的行。事实上,我必须看3行,但是当前的一行被包括在内,所以我应该在前面的2行之间设置行 找到不跨越小时界限的问题的简单方法是:
select
id,
datepart(yy,timestamp) as yy,
datepart(mm,timestamp) as mm,
datepart(dd,timestamp) as dd,
datepart(hh,timestamp) as hh,
count(*)
from
logintable
group by
id,
datepart(yy,timestamp),
datepart(mm,timestamp),
datepart(dd,timestamp),
datepart(hh,timestamp)
having
count(*) >= 3
如果您的表非常大,您可能会将其下载到每天至少三次登录的客户机上,然后自行加入。它仍然会错过一天中的登录,但这是一个简单的解决方案,可以在您使用更复杂的解决方案时推动您前进。您在SQL 2014 web Edition上使用的是哪种RDM/版本?这不会返回一个在三天内每天访问该站点两次的客户吗?看起来像评论中提到的SQL Server。是的,我知道。所以我假设他可以很容易地在MS SQL中复制相同的行为?实际上MySQL也不擅长这一点,为此我最终迁移到了Postgresql。他们的窗口功能使这种相关分析变得更容易。SQL Server的SQL方言与MySQL有很大的不同,因此不容易移植。基于Postgresql窗口的解决方案可能更容易迁移。我从未见过这样的查询,我喜欢它!但是,尝试使用此集合并不返回任何结果:选择1,'2016-08-16 00:00'联合选择2,'2016-08-16 00:00'联合选择3,'2016-08-16 00:00'联合选择1,'2016-08-16 01:25'联合选择2,'2016-08-16 01:25'联合选择3,'2016-08-16 01:47'联合选择2,'2016-08-16 01:27'联合选择3,“2016-08-16 02:25”联合选择3,“2016-08-16 03:25”联合选择1,“2016-08-16 02:05”联合选择1,“2016-08-16 02:20”感谢您的提醒!我在窗口规范中犯了一个错误。我已编辑查询以更正此问题。
CREATE TABLE loginTable (id INT NOT NULL, timestamp DATETIME NOT NULL)
INSERT INTO loginTable (id, timestamp)
SELECT 1, '2016-08-16 00:00'
UNION SELECT 2, '2016-08-16 00:00'
UNION SELECT 3, '2016-08-16 00:00'
UNION SELECT 1, '2016-08-16 00:25'
UNION SELECT 2, '2016-08-16 01:25'
UNION SELECT 3, '2016-08-16 00:25'
UNION SELECT 1, '2016-08-16 00:47'
UNION SELECT 2, '2016-08-16 01:27'
UNION SELECT 3, '2016-08-16 02:25'
UNION SELECT 3, '2016-08-16 03:25'
UNION SELECT 1, '2016-08-16 01:05'
WITH tbl AS (
SELECT id
, IIF( DATEDIFF(minute,
lag(ts, 1) OVER (PARTITION BY id ORDER BY ts asc ),
ts )<=60,
1, 0) as freq60
FROM loginTable
)
SELECT id FROM tbl
GROUP BY tbl.id HAVING SUM(freq60) >=3
ORDER BY tbl.id
with tbl as (
select cust_id
,case when extract(epoch from (ts - lag(ts, 1) over w) ) < 3600 then 1
else 0
end as freq60
from loginTable
window w as (partition by id order by ts asc )
)
select cust_id
from tbl
group by tbl.cust_id having sum(freq60) >=3
order by tbl.cust_id
id freq60 prev_ts ts intval
1 0 null 2016-08-16 00:00:00
1 1 2016-08-16 00:00:00 2016-08-16 00:25:00 25
1 1 2016-08-16 00:25:00 2016-08-16 00:47:00 22
1 1 2016-08-16 00:47:00 2016-08-16 01:05:00 18
CREATE TABLE loginTable (id INT NOT NULL, timestamp DATETIME NOT NULL)
INSERT INTO loginTable (id, timestamp)
SELECT 1, '2016-08-16 00:00'
UNION SELECT 2, '2016-08-16 00:00'
UNION SELECT 3, '2016-08-16 00:00'
UNION SELECT 1, '2016-08-16 00:25'
UNION SELECT 2, '2016-08-16 01:25'
UNION SELECT 3, '2016-08-16 00:25'
UNION SELECT 1, '2016-08-16 00:47'
UNION SELECT 2, '2016-08-16 01:27'
UNION SELECT 3, '2016-08-16 02:25'
UNION SELECT 3, '2016-08-16 03:25'
UNION SELECT 1, '2016-08-16 01:05';
select id, min_t, max_t from (
select id,
min(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as min_t,
max(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as max_t,
count(timestamp) over (partition by id order by id, timestamp rows between 2 preceding and current row) as num_t
from loginTable
) ts_data
where ABS(DATEDIFF(minute,min_t,max_t)) <= 60 and num_t=3;
select
id,
datepart(yy,timestamp) as yy,
datepart(mm,timestamp) as mm,
datepart(dd,timestamp) as dd,
datepart(hh,timestamp) as hh,
count(*)
from
logintable
group by
id,
datepart(yy,timestamp),
datepart(mm,timestamp),
datepart(dd,timestamp),
datepart(hh,timestamp)
having
count(*) >= 3