Amazon redshift 按月红移的队列分析
我正在尝试建立一个关于每月保留率的队列分析,但在正确使用月数列方面遇到了挑战。月号应返回用户交易的月份,即0表示注册月份,1表示注册月份后的第一个月,2表示第二个月直到最后一个月,但目前,它在某些单元格中返回负数 它应该是这样的表:Amazon redshift 按月红移的队列分析,amazon-redshift,Amazon Redshift,我正在尝试建立一个关于每月保留率的队列分析,但在正确使用月数列方面遇到了挑战。月号应返回用户交易的月份,即0表示注册月份,1表示注册月份后的第一个月,2表示第二个月直到最后一个月,但目前,它在某些单元格中返回负数 它应该是这样的表: cohort_month total_users month_number percentage ---------- ----------- -- ------------ --------- January 100
cohort_month total_users month_number percentage
---------- ----------- -- ------------ ---------
January 100 0 40
January 341 1 90
January 115 2 90
February 103 0 73
February 100 1 40
March 90 0 90
以下是SQL:
with cohort_items as (
select
extract(month from insert_date) as cohort_month,
msisdn as user_id
from mfscore.t_um_user_detail where extract(year from insert_date)=2020
order by 1, 2
),
user_activities as (
select
A.sender_msisdn,
extract(month from A.insert_date)-C.cohort_month as month_number
from mfscore.t_wm_transaction_logs A
left join cohort_items C ON A.sender_msisdn = C.user_id
where extract(year from A.insert_date)=2020
group by 1, 2
),
cohort_size as (
select cohort_month, count(1) as num_users
from cohort_items
group by 1
order by 1
),
B as (
select
C.cohort_month,
A.month_number,
count(1) as num_users
from user_activities A
left join cohort_items C ON A.sender_msisdn = C.user_id
group by 1, 2
)
select
B.cohort_month,
S.num_users as total_users,
B.month_number,
B.num_users * 100 / S.num_users as percentage
from B
left join cohort_size S ON B.cohort_month = S.cohort_month
where B.cohort_month IS NOT NULL
order by 1, 3
我认为秩窗口函数是正确的解决方案。因此,我们的想法是为每个用户分配一个月的用户活动排名,按年度和月份排序 比如:
WITH activity_per_user AS (
SELECT
user_id,
event_date,
RANK() OVER (PARTITION BY user_id ORDER BY DATE_PART('year', event_date) , DATE_PART('month', event_date) ASC) AS month_number
FROM user_activities_table
)
秩编号从1开始,所以您可能需要减去1
然后,您可以按用户id和月号分组,以从订阅中获得每个用户每月的交互次数(相应地适应您的用例)
以下是文件:
SELECT
user_id,
month_number,
COUNT(1) AS n_interactions
FROM activity_per_user
GROUP BY 1, 2