如何在Google BigQuery SQL中预测按相同订单id分组的交易的客户流失?

如何在Google BigQuery SQL中预测按相同订单id分组的交易的客户流失?,sql,google-bigquery,lag,lead,Sql,Google Bigquery,Lag,Lead,我对BigQuery中的SQL非常陌生。我试图根据客户的订单id和开始日期预测客户状态 规则 同一订单id的客户状态必须相同 必须根据下一个订单ID(交易)或上一个订单ID预测客户状态 这是我的密码 select week_start, t.account_id, t.order_id, date_start,date_ended, from unnest(generate_date_array(date('2018-12-31'), date('2019-2-11'), interval 1

我对BigQuery中的SQL非常陌生。我试图根据客户的订单id和开始日期预测客户状态

规则

  • 同一订单id的客户状态必须相同

  • 必须根据下一个订单ID(交易)或上一个订单ID预测客户状态

  • 这是我的密码

    select week_start, t.account_id, t.order_id, date_start,date_ended,
    
    from unnest(generate_date_array(date('2018-12-31'), date('2019-2-11'), interval 1 week) )  week_start cross join
         (select distinct account_id
          from t
         ) a left join
         t
         on t.account_id = a.account_id and
            t.reporting_week_start_date = week_start
    order by  a.account_id,week_start
    
    我的代码的输出

    week_start | account_id  | order_id  | date_start   | date_ended   |
    2018-12-31 |    1        |    1001   |   2019-01-01 |  2019-01-15  |
    2019-01-07 |    1        |    1001   |   2019-01-01 |  2019-01-15  |
    2019-01-14 |    1        |    1002   |   2019-01-15 |  2019-01-27  |
    2019-01-21 |    1        |    1002   |   2019-01-15 |  2019-01-27  |
    2019-01-28 |    1        |    1003   |   2019-01-29 |  2019-02-04  |
    
    期望输出

    week_start | account_id  | order_id  | date_start   | date_ended   | Customer_Status
    2018-12-31 |    1        |    1001   |   2019-01-01 |  2019-01-15  |  First Time
    2019-01-07 |    1        |    1001   |   2019-01-01 |  2019-01-15  |  First Time
    2019-01-14 |    1        |    1002   |   2019-01-15 |  2019-01-27  |  Recurring
    2019-01-21 |    1        |    1002   |   2019-01-15 |  2019-01-27  |  Recurring
    2019-01-28 |    1        |    1003   |   2019-01-29 |  2019-02-04  |  Churned
    

    如何在大查询SQL中执行此操作?谢谢。

    在分析了您的案例之后,我找到了您可以用来对客户进行分类的规则。因为您想在BigQuery上对每个客户进行分类,所以所有规则都是硬编码的

    您提到分类应基于下一个和上一个订单id,但我也使用了账户id字段。出于这个原因,我使用了两种方法

    下面的示例代码应该合并到您当前的查询中,我使用您当前的输出作为源输入。我遵循以下算法:

    1)对首次客户进行分类。因此,我创建了临时表new_c

    2)对定期客户进行分类,然后创建rec_c临时表

    3)使用最后一张临时表搅动的客户

    4)根据源输入(您当前的输出)连接所有表,在该阶段,它被处理为复制状态,就像我们有两个“首次”客户状态一样,依此类推

    下面是代码,请注意,我使用了稍微不同的列名:

        #First selecting the new costumers and changing the namwes of some columns so we can use left join in the end
    WITH
      new_c AS (
      SELECT
        _account_id__ AS acc,
        _order_id__ AS oo,
        week_start_ AS wk_nc,
        CASE
          WHEN ((LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)IS NULL) AND (LAG(_order_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)IS NULL )) THEN 'First time'
          WHEN (LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC) != _account_id__ ) THEN 'First time'
      END
        AS status
      FROM
        `test-proj-261014.sample.customer` ),
    
      #Now, we classify the customers who are Recurring
      rec_c AS (
      SELECT
        _account_id__ AS acc_rc,
        _order_id__ AS oo_rc,
        week_start_ AS wk_rc,
        CASE
          WHEN ((LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)= _account_id__ ) AND (LAG(_order_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)!= _order_id__) AND (LEAD(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC) = _account_id__ ) ) THEN 'Recurring'
      END
        AS status_rc
      FROM
        `test-proj-261014.sample.customer` ),
    
      #Last, the Churned customers
      churn_c AS (
      SELECT
        _account_id__ AS acc_c,
        _order_id__ AS oo_c,
        week_start_ AS wk_ch,
        CASE
          WHEN ((LEAD(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)!= _account_id__ ) AND (LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)= _account_id__ )) THEN 'Churned'
          WHEN ((LEAD(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC) IS NULL)
          AND (LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)= _account_id__ )) THEN 'Churned'
      END
        AS status_churn
      FROM
        `test-proj-261014.sample.customer` )
    
      #In the end, all the temporary tables are joined together and we handle same status customers.
    SELECT
      week_start_,
      _account_id__,
      _order_id__,
      _date_start___,
      _date_ended___,
      CASE
        WHEN status IS NOT NULL THEN status
        WHEN (LAG(status) OVER (PARTITION BY acc ORDER BY oo ASC) ='First time'
        AND LAG(oo) OVER (PARTITION BY acc ORDER BY oo ASC)=oo)THEN 'First time'
    
        WHEN status_rc IS NOT NULL THEN status_rc
        WHEN (LAG(status_rc) OVER (PARTITION BY acc_rc ORDER BY oo_rc ASC) ='Recurring'
        AND LAG(oo_rc) OVER (PARTITION BY acc ORDER BY oo ASC)=oo_rc) THEN 'Recurring'
    
        WHEN churn_c IS NOT NULL THEN status_churn
        WHEN (LAG(status_churn) OVER (PARTITION BY acc_c ORDER BY oo_c ASC) ='Churned'
        AND LAG(oo_c) OVER (PARTITION BY acc ORDER BY oo ASC)=oo_c) THEN 'Churned'
    
    END
      AS final_status
    FROM
      `test-proj-261014.sample.customer`
    INNER JOIN
      new_c
    ON
      _account_id__ = ACC
      AND _order_id__ = OO
      AND week_start_ = wk_nc
    INNER JOIN
      rec_c
    ON
      _account_id__ = acc_rc
      AND _order_id__ = oo_rc
      AND week_start_ = wk_rc
    INNER JOIN
      churn_c
    ON
      _account_id__ = acc_c
      AND _order_id__ = oo_c
      AND week_start_ = wk_ch ;
    

    定义客户状态的规则是什么?