Postgresql 亚马逊红移中的每月保留
我试图在Postgresql 亚马逊红移中的每月保留,postgresql,amazon-redshift,Postgresql,Amazon Redshift,我试图在Amazon Redshift中计算每月的保留率,并提出了以下查询: 查询1 SELECT EXTRACT(year FROM activity.created_at) AS Year, EXTRACT(month FROM activity.created_at) AS Month, COUNT(DISTINCT activity.member_id) AS active_users, COUNT(DISTINCT future_activit
Amazon Redshift
中计算每月的保留率,并提出了以下查询:
查询1
SELECT EXTRACT(year FROM activity.created_at) AS Year,
EXTRACT(month FROM activity.created_at) AS Month,
COUNT(DISTINCT activity.member_id) AS active_users,
COUNT(DISTINCT future_activity.member_id) AS retained_users,
COUNT(DISTINCT future_activity.member_id) / COUNT(DISTINCT activity.member_id)::float AS retention
FROM ads.fbs_page_view_staging activity
LEFT JOIN ads.fbs_page_view_staging AS future_activity
ON activity.mongo_id = future_activity.mongo_id
AND datediff ('month',activity.created_at,future_activity.created_at) = 1
GROUP BY Year,
Month
ORDER BY Year,
Month
出于某种原因,此查询返回zero
保留的用户
和zero
保留
。我非常感谢任何关于为什么会发生这种情况的帮助,或者一个完全不同的关于每月保留金的查询会起作用
我根据另一篇SO帖子修改了查询,如下所示:
查询2
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('month', created_at)) OVER (PARTITION BY member_id
ORDER BY date_trunc('month', created_at))
= date_trunc('month', created_at) - interval '1 month'
OR NULL AS repeat_transaction
FROM ads.fbs_page_view_staging
WHERE created_at >= '2016-01-01'::date
AND created_at < '2016-04-01'::date -- time range of interest.
GROUP BY 1, 2
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
我觉得query2
会比query1
更好,所以我更愿意修复这个错误
任何帮助都将不胜感激。查询1看起来不错。我试过类似的。见下文。您正在表(ads.fbs\u page\u view\u staging)和同一列(在创建时)上使用自联接。假设mongo_id是唯一的,
datediff('month'..)
将始终返回0,而datediff('month',activity.created_at,future_activity.created_at)=1将始终为false
-- Count distinct events of join_col_id that have lapsed for one month.
SELECT count(distinct E.join_col_id) dist_ct
FROM public.fact_events E
JOIN public.dim_table Z
ON E.join_col_id = Z.join_col_id
WHERE datediff('month', event_time, sysdate) = 1;
-- 2771654 -- dist_ct
我认为查询1的问题在于,您已将区间条件datediff('month',activity.created\u at,future\u activity.created\u at)=1
放入联接中。我不认为那样行得通。因此连接失败,因此连接右侧的值为NULL,导致计数为零。当您将条件移动到“WHERE”时会发生什么?完全正确,我加入了错误的列,应该是member\u id
,而不是mongo\u id
。我犯了愚蠢的错误。谢谢你指出这一点。
-- Count distinct events of join_col_id that have lapsed for one month.
SELECT count(distinct E.join_col_id) dist_ct
FROM public.fact_events E
JOIN public.dim_table Z
ON E.join_col_id = Z.join_col_id
WHERE datediff('month', event_time, sysdate) = 1;
-- 2771654 -- dist_ct