Amazon redshift 按月红移的队列分析

Amazon redshift 按月红移的队列分析,amazon-redshift,Amazon Redshift,我正在尝试建立一个关于每月保留率的队列分析,但在正确使用月数列方面遇到了挑战。月号应返回用户交易的月份,即0表示注册月份,1表示注册月份后的第一个月,2表示第二个月直到最后一个月,但目前,它在某些单元格中返回负数 它应该是这样的表: cohort_month total_users month_number percentage ---------- ----------- -- ------------ --------- January 100

我正在尝试建立一个关于每月保留率的队列分析,但在正确使用月数列方面遇到了挑战。月号应返回用户交易的月份,即0表示注册月份,1表示注册月份后的第一个月,2表示第二个月直到最后一个月,但目前,它在某些单元格中返回负数

它应该是这样的表:

cohort_month  total_users   month_number  percentage 
----------  ----------- --  ------------  ---------  
   January      100              0            40
   January      341              1            90
   January      115              2            90
   February     103              0            73
   February     100              1            40
   March        90               0            90

以下是SQL:

with cohort_items as (
  select
    extract(month from insert_date) as cohort_month,
    msisdn as user_id
  from mfscore.t_um_user_detail where extract(year from insert_date)=2020
  order by 1, 2
),


user_activities as (
  select
    A.sender_msisdn,
    extract(month from A.insert_date)-C.cohort_month  as month_number
  from mfscore.t_wm_transaction_logs A
  left join cohort_items C ON A.sender_msisdn = C.user_id
  where extract(year from A.insert_date)=2020
  group by 1, 2
),

cohort_size as (
  select cohort_month, count(1) as num_users
  from cohort_items
  group by 1
  order by 1
),

B as (
  select
    C.cohort_month,
    A.month_number,
    count(1) as num_users
  from user_activities A
  left join cohort_items C ON A.sender_msisdn = C.user_id
  group by 1, 2
)

select
  B.cohort_month,
  S.num_users as total_users,
  B.month_number,
  B.num_users * 100 / S.num_users as percentage
from B
left join cohort_size S ON B.cohort_month = S.cohort_month
where B.cohort_month IS NOT NULL
order by 1, 3

我认为秩窗口函数是正确的解决方案。因此,我们的想法是为每个用户分配一个月的用户活动排名,按年度和月份排序

比如:

WITH activity_per_user AS (
    SELECT
        user_id,
        event_date,
        RANK() OVER (PARTITION BY user_id ORDER BY DATE_PART('year', event_date) , DATE_PART('month', event_date) ASC) AS month_number
    FROM user_activities_table
    )
秩编号从1开始,所以您可能需要减去1

然后,您可以按用户id和月号分组,以从订阅中获得每个用户每月的交互次数(相应地适应您的用例)

以下是文件:

SELECT
    user_id,
    month_number,
    COUNT(1) AS n_interactions
FROM activity_per_user
GROUP BY 1, 2