SQL介于日期之间的列字段不为空

SQL介于日期之间的列字段不为空,sql,presto,Sql,Presto,我想统计2019-01-01年活跃的所有独特客户,条件是他们在随后的3天内也活跃 主表 date customer_id time_spent_online_min 2019-01-01 1 5 2019-01-01 2 6 2019-01-01 3 4 2019-01-02 1 7 2019-01-02 2 5 2019-01-03 3

我想统计2019-01-01年活跃的所有独特客户,条件是他们在随后的3天内也活跃

主表

date        customer_id   time_spent_online_min
2019-01-01  1             5
2019-01-01  2             6
2019-01-01  3             4
2019-01-02  1             7
2019-01-02  2             5
2019-01-03  3             3
2019-01-04  1             4
2019-01-04  2             6

输出表

date         total_active_customers
2019-01-01   2

这就是我迄今为止所尝试的:

with cte as(

select customer_id
      ,date
      ,time_spent_online_min

from main_table
where date between date '2019-01-01' and date '2019-01-04'
and customer_id is not null)

select    date 
         ,count(distinct(customer_id)) as total_active_customers
from cte
where date = date '2019-01-01'
group by 1

我将在这里使用exists逻辑:

SELECT COUNT(*)
FROM main_table t1
WHERE
    date = '2019-01-01' AND
    EXISTS (SELECT 1 FROM main_table t2
            WHERE t2.customer_id = t1.customer_id AND t2.date = '2019-01-02') AND
    EXISTS (SELECT 1 FROM main_table t2
            WHERE t2.customer_id = t1.customer_id AND t2.date = '2019-01-03') AND        
    EXISTS (SELECT 1 FROM main_table t2
            WHERE t2.customer_id = t1.customer_id AND t2.date = '2019-01-04');

此答案假设给定客户在一个活动日期只有一条记录。

如果您每天只有一条记录,您可以使用
lead()

如果您每天可以有多条记录,请聚合并使用
lead()

您还可以轻松地将其扩展到任何日期:

select date, count(*)
from (select t.*, lead(date, 3) over (partition by customer_id order by date) as date_3
      from main_table t
     ) t
where date_3 = date + interval '3' day
group by date;
select date, count(*)
from (select t.*, lead(date, 3) over (partition by customer_id order by date) as date_3
      from (select customer_id, date, sum(time_spent_online_min) as time_spent_online_min
            from maintable t
            group by customer_id, date
           ) t
     ) t
where date = '2019-01-01' and
      date_3 = '2019-01-04'
group by date;
select date, count(*)
from (select t.*, lead(date, 3) over (partition by customer_id order by date) as date_3
      from main_table t
     ) t
where date_3 = date + interval '3' day
group by date;
WITH
-- your input
input(dt,customer_id,time_spent_online_min) AS (
          SELECT DATE '2019-01-01',1,5
UNION ALL SELECT DATE '2019-01-01',2,6
UNION ALL SELECT DATE '2019-01-01',3,4
UNION ALL SELECT DATE '2019-01-02',1,7
UNION ALL SELECT DATE '2019-01-02',2,5
UNION ALL SELECT DATE '2019-01-03',3,3
UNION ALL SELECT DATE '2019-01-04',1,4
UNION ALL SELECT DATE '2019-01-04',2,6
)
,
-- count the active days in this row and the following 3 days
count_activity AS (
  SELECT
    *
  , COUNT(customer_id) OVER(
      PARTITION BY customer_id ORDER BY dt
      RANGE BETWEEN CURRENT ROW AND INTERVAL '3 DAY'  FOLLOWING
    ) AS act_count
  FROM input
)
SELECT
  dt
, COUNT(*) AS total_active_customers
FROM count_activity
WHERE dt='2019-01-01'
  AND act_count > 2
GROUP BY dt                                                         
;
-- out      dt     | total_active_customers 
-- out ------------+------------------------
-- out  2019-01-01 |                      2