Google bigquery Google Big Query-基于多个日期条件按状态计算每月总计

Google bigquery Google Big Query-基于多个日期条件按状态计算每月总计,google-bigquery,Google Bigquery,我有一个包含以下数据的表格: customer_id subscription_id plan status trial_start trial_end activated_at cancelled_at 1 jg1 basic cancelled 2020-06-26 2020-07-14 2020-07-14 2020-0

我有一个包含以下数据的表格:

    customer_id     subscription_id     plan      status     trial_start     trial_end      activated_at   cancelled_at

        1               jg1             basic    cancelled    2020-06-26     2020-07-14      2020-07-14     2020-09-25
        
        2               ab1             basic    cancelled    2020-08-10     2020-08-24      2020-08-24     2021-02-15

        3               cf8             basic    cancelled    2020-08-25     2020-09-04      2020-09-04     2020-10-24
                    
        4               bc2             basic     active      2020-10-12     2020-10-26      2020-10-26
                
        5               hg4             basic     active      2021-01-09     2021-02-08      2021-02-08
            
        6               cd5             basic    in-trial     2021-02-26                                
正如您从表中注意到的,当订阅处于试用期时,
status=in_-trial
。当订阅从试用版中的<代码>转换为<代码>激活版<代码>时,在<代码>日期有<代码>激活版。当
试用版
活动版
订阅被取消时,状态切换到
已取消
,并且在日期出现
已取消<代码>状态
列始终仅显示订阅的最新状态。对于状态的每一次更改,订阅时不会出现新行。对于状态的每一次更改,状态都会更改,并且相应的日期会反映状态更改的时间

我的目标是按月计算有多少订阅处于试用状态,有多少订阅处于活动状态,有多少订阅处于取消状态。由于“状态”列反映了订阅的最新状态,因此查询必须能够根据可用日期列确定有多少订阅处于“状态=试用期”、“状态=活动”和“状态=活动”

如果特定订阅在给定的月份内具有多个状态(例如,
subscription\u id=ab1
在2020年8月处于试用状态,并在2020年8月转换为活动状态),我只希望该订阅考虑最新状态。例如,对于
subscription\u id=ab1
我希望它在2020年8月被算作
active
订阅

我想要的输出是:

    date          in_trial   active    cancelled
   2020-06-01         1        0           0
   2020-07-01         0        1           0
   2020-08-01         1        2           0
   2020-09-01         0        2           1         
   2020-10-01         0        2           1 
   2020-11-01         0        2           0
   2020-12-01         0        2           0 
   2021-01-01         1        2           0
   2021-02-01         1        2           1
   2021-03-01         1        2           0
或者,只要数字正确,结果可以以不同的格式显示。输出的另一个示例可以是:

   date           status      count
2020-06-01       in_trial       1
2020-06-01        active        0
2020-06-01       cancelled      0
2020-07-01       in_trial       0
2020-07-01        active        1
2020-07-01       cancelled      0
   ...             ...         ...
2021-03-01       in_trial       1
2021-03-01        active        2
2021-03-01       cancelled      0
以下是可用于重现此问题中提供的示例表的查询:

SELECT 1 AS customer_id, 'jg1' AS subscription_id, 'basic' AS plan, 'cancelled' AS status, '2020-06-26' AS trial_start, '2020-07-14' AS trial_end, '2020-07-14' AS activated_at, '2020-09-25' AS cancelled_at UNION ALL 
SELECT 2 AS customer_id, 'ab1' AS subscription_id, 'basic' AS plan, 'cancelled' AS status, '2020-08-10' AS trial_start, '2020-08-24' AS trial_end, '2020-08-24' AS activated_at, '2021-02-15' AS cancelled_at UNION ALL 
SELECT 3 AS customer_id, 'cf8' AS subscription_id, 'basic' AS plan, 'cancelled' AS status, '2020-08-25' AS trial_start, '2020-09-04' AS trial_end, '2020-09-04' AS activated_at, '2020-10-24' AS cancelled_at UNION ALL 
SELECT 4 AS customer_id, 'bc2' AS subscription_id, 'basic' AS plan, 'active' AS status, '2020-10-12' AS trial_start, '2020-10-26' AS trial_end, '2020-10-26' AS activated_at, '' AS cancelled_at UNION ALL 
SELECT 5 AS customer_id, 'hg4' AS subscription_id, 'basic' AS plan, 'active' AS status, '2021-01-09' AS trial_start, '2021-02-08' AS trial_end, '2021-02-08' AS activated_at, '' AS cancelled_at UNION ALL 
SELECT 6 AS customer_id, 'cd5' AS subscription_id, 'basic' AS plan, 'in_trial' AS status, '2021-02-26' AS trial_start, '' AS trial_end, '' AS activated_at, '' AS cancelled_at

从昨天早上开始,我就一直在研究这个问题,并一直在寻找一种有效的方法。提前感谢您帮助我解决这个问题。

下面的内容应该适合您

select month, 
  count(distinct if(status = 0, customer_id, null)) in_trial, 
  count(distinct if(status = 1, customer_id, null)) active, 
  count(distinct if(status = 2, customer_id, null)) canceled
from (
  select month, customer_id, 
    array_agg(status order by status desc limit 1)[offset(0)] status
  from (
    select distinct customer_id, 0 status, date_trunc(date, month) month
    from `project.dataset.table`,
    unnest(generate_date_array(date(trial_start), ifnull(date(trial_end), current_date()))) date 
      union all
    select distinct customer_id, 1 status, date_trunc(date, month) month
    from `project.dataset.table`,
    unnest(generate_date_array(date(activated_at), ifnull(date(cancelled_at), current_date()))) date 
      union all
    select distinct customer_id, 2 status, date_trunc(date(cancelled_at), month) month
    from `project.dataset.table`
)
where not month is null
group by month, customer_id
)
group by month
# order by month 
如果应用于问题中的样本数据,则输出为


澄清-是
''作为试验结束
还是
作为试验结束
为空?''表示空值。好问题-我应该把这说得更清楚。所以在你的重现查询中-它应该是
null
,而不是
'
-对吗?不是特别重要,但对进一步的编码有影响。还有与日期相关的列-它们是字符串还是日期类型?正确,它应该为null而不是空字符串。截止到日期列,它们是我实际数据中的日期时间,精度下降到第二位谢谢你,米哈伊尔,我现在分析它来理解查询。如果我有任何问题,我会让你知道,我很乐意投票,并在解决方案明确后选择最佳答案,简言之-1)它有三个最内部的查询,每个查询分别针对不同的每日状态2)下一个选择计算每个用户在给定月份的最后一个状态3)和最后一个-最外部选择只针对三个订阅状态中的每一个每月对不同的客户进行简单的最终聚合:o)谢谢你,米哈伊尔,我在我的数据子集上尝试了这个方法,效果非常好。一个问题——例如,如果我不想计算公司数量,而是想计算订阅数量,而一家公司可能有多个订阅,我将如何计算?现在使用您的查询来尝试解决这个问题