如何在BigQuery SQL中计算完整/重复保留_Sql_Google Bigquery_Retention

如何在BigQuery SQL中计算完整/重复保留

sql google-bigquery

如何在BigQuery SQL中计算完整/重复保留,sql,google-bigquery,retention,Sql,Google Bigquery,Retention,我试图计算“滚动保留”或“重复保留”（不确定这是什么合适的名称），但我只想计算连续每个月下订单的用户比例因此，如果有10个用户在2020年1月下单，其中5个用户在2月回来，这将相当于50%的保留率现在，3月份，我只想考虑二月订购的5个用户，仍然注意一月总队列大小。因此，如果2月份的2个用户在3月份返回，3月份的保留率将为2/10=20%。如果1月份的用户在2月份没有返回，但在3月份下了订单，他们将不包括在3月份的计算中，因为他们在2月份没有返回基本上，该保留率将逐渐降低至0%，并且永远不

我试图计算“滚动保留”或“重复保留”（不确定这是什么合适的名称），但我只想计算连续每个月下订单的用户比例

因此，如果有10个用户在2020年1月下单，其中5个用户在2月回来，这将相当于50%的保留率

现在，3月份，我只想考虑二月订购的5个用户，仍然注意一月总队列大小。

因此，如果2月份的2个用户在3月份返回，3月份的保留率将为2/10=20%。如果1月份的用户在2月份没有返回，但在3月份下了订单，他们将不包括在3月份的计算中，因为他们在2月份没有返回

基本上，该保留率将逐渐降低至0%，并且永远不会增加

以下是我迄今为止所做的工作：

 WITH first_order AS (SELECT 
  customerEmail,
  MIN(orderedat) as firstOrder,
FROM fact AS fact
GROUP BY 1 ),

cohort_data AS (SELECT 
  first_order.customerEmail,
  orderedAt as order_month,
  MIN(FORMAT_DATE("%y-%m (%b)", date(firstorder))) as cohort_month,
FROM first_order as first_order
LEFT JOIN fact as fact
ON first_order.customeremail = fact.customeremail
GROUP BY 1,2, FACT.orderedAt),

cohort_count AS (select cohort_month, count(distinct customeremail) AS total_cohort_count FROM cohort_data GROUP BY 1 )

SELECT  
    cd.cohort_month,
    date_trunc(date(cd.order_month), month) as order_month,
    total_cohort_count,
    count(distinct cd.customeremail) as total_repeat
FROM cohort_data as cd
JOIN cohort_data as last_month
    ON cd.customeremail= last_month.customeremail
    and date(cd.order_month) = date_add(date(last_month.order_month), interval 1 month)
LEFT JOIN cohort_count AS cc 
    on cd.cohort_month = cc.cohort_month
GROUP BY 1,2,3
ORDER BY  cohort_month, order_month ASC

结果如下。我不确定我哪里弄错了，但是数字太小了，而且在一些不应该的月份里，保留率增加了

在上一次查询中，我做了一个内部联接，以便将上个月与当前月份进行比较，但它并没有完全按照我所希望的方式工作

样本数据：

如果有任何帮助，我将不胜感激。

我将从每个客户每月一行开始。然后，我会列举客户/月份，只保留那些没有差距的客户/月份。和合计：

with customer_months as (
      select customer_email,
             date_trunc(ordered_at, month) as yyyymm,
             min(date_trunc(ordered_at, month)) over (partition by customer_email) as first_yyyymm
      from cohort_data
      group by 1, 2 
     )
select first_yyyymm, yyyymm, count(*)
from (select cm.*,
             row_number() over (partition by custoemr_email order by yyyymm) as seqnum
      from customer_months cm
     ) cm
where yyyymm = date_add(first_yyyymm, interval seqnum - 1 month)
group by 1, 2
order by 1, 2;

“样本数据会很有帮助的。@Gordon添加了一些样本数据：）