SQL介于日期之间的列字段不为空
我想统计2019-01-01年活跃的所有独特客户,条件是他们在随后的3天内也活跃 主表SQL介于日期之间的列字段不为空,sql,presto,Sql,Presto,我想统计2019-01-01年活跃的所有独特客户,条件是他们在随后的3天内也活跃 主表 date customer_id time_spent_online_min 2019-01-01 1 5 2019-01-01 2 6 2019-01-01 3 4 2019-01-02 1 7 2019-01-02 2 5 2019-01-03 3
date customer_id time_spent_online_min
2019-01-01 1 5
2019-01-01 2 6
2019-01-01 3 4
2019-01-02 1 7
2019-01-02 2 5
2019-01-03 3 3
2019-01-04 1 4
2019-01-04 2 6
输出表
date total_active_customers
2019-01-01 2
这就是我迄今为止所尝试的:
with cte as(
select customer_id
,date
,time_spent_online_min
from main_table
where date between date '2019-01-01' and date '2019-01-04'
and customer_id is not null)
select date
,count(distinct(customer_id)) as total_active_customers
from cte
where date = date '2019-01-01'
group by 1
我将在这里使用exists逻辑:
SELECT COUNT(*)
FROM main_table t1
WHERE
date = '2019-01-01' AND
EXISTS (SELECT 1 FROM main_table t2
WHERE t2.customer_id = t1.customer_id AND t2.date = '2019-01-02') AND
EXISTS (SELECT 1 FROM main_table t2
WHERE t2.customer_id = t1.customer_id AND t2.date = '2019-01-03') AND
EXISTS (SELECT 1 FROM main_table t2
WHERE t2.customer_id = t1.customer_id AND t2.date = '2019-01-04');
此答案假设给定客户在一个活动日期只有一条记录。如果您每天只有一条记录,您可以使用
lead()
:
如果您每天可以有多条记录,请聚合并使用lead()
:
您还可以轻松地将其扩展到任何日期:
select date, count(*)
from (select t.*, lead(date, 3) over (partition by customer_id order by date) as date_3
from main_table t
) t
where date_3 = date + interval '3' day
group by date;
select date, count(*)
from (select t.*, lead(date, 3) over (partition by customer_id order by date) as date_3
from (select customer_id, date, sum(time_spent_online_min) as time_spent_online_min
from maintable t
group by customer_id, date
) t
) t
where date = '2019-01-01' and
date_3 = '2019-01-04'
group by date;
select date, count(*)
from (select t.*, lead(date, 3) over (partition by customer_id order by date) as date_3
from main_table t
) t
where date_3 = date + interval '3' day
group by date;
WITH
-- your input
input(dt,customer_id,time_spent_online_min) AS (
SELECT DATE '2019-01-01',1,5
UNION ALL SELECT DATE '2019-01-01',2,6
UNION ALL SELECT DATE '2019-01-01',3,4
UNION ALL SELECT DATE '2019-01-02',1,7
UNION ALL SELECT DATE '2019-01-02',2,5
UNION ALL SELECT DATE '2019-01-03',3,3
UNION ALL SELECT DATE '2019-01-04',1,4
UNION ALL SELECT DATE '2019-01-04',2,6
)
,
-- count the active days in this row and the following 3 days
count_activity AS (
SELECT
*
, COUNT(customer_id) OVER(
PARTITION BY customer_id ORDER BY dt
RANGE BETWEEN CURRENT ROW AND INTERVAL '3 DAY' FOLLOWING
) AS act_count
FROM input
)
SELECT
dt
, COUNT(*) AS total_active_customers
FROM count_activity
WHERE dt='2019-01-01'
AND act_count > 2
GROUP BY dt
;
-- out dt | total_active_customers
-- out ------------+------------------------
-- out 2019-01-01 | 2