Google bigquery 在大查询中使用'Lead'窗口函数时出现时间戳问题

Google bigquery 在大查询中使用'Lead'窗口函数时出现时间戳问题,google-bigquery,sqldatatypes,window-functions,Google Bigquery,Sqldatatypes,Window Functions,我正在尝试获取客户的第一个订单,他们的下一个订单,以及两个订单之间的天数差。看起来很简单。我采取的步骤如下: 使用MIN()和LEAD()函数提取客户的第一个和第二个订单 使用这两个字段运行DATEDIFF以获得天数差异 简短代码如下所示: SELECT cust, MIN(ord_time) first_ord, LEAD(ord_time, 1) OVER

我正在尝试获取客户的第一个订单,他们的下一个订单,以及两个订单之间的天数差。看起来很简单。我采取的步骤如下:

  • 使用MIN()和LEAD()函数提取客户的第一个和第二个订单
  • 使用这两个字段运行DATEDIFF以获得天数差异
  • 简短代码如下所示:

    SELECT cust, MIN(ord_time) first_ord, LEAD(ord_time, 1) 
                                          OVER 
                                          (PARTITION BY customer_id
                                          ORDER BY ord_time) next_ord
    FROM
    (SELECT cust, ord_time
    FROM df.orders
    GROUP EACH BY cust, ord_time)
    
    这里还有其他一些过滤连接和分组,但这是基本块

    输出应该是一个带有客户ID的字段和两个时间戳字段。两个时间戳字段如下所示:

    所以一切看起来都很好。但是,当我尝试使用这两个字段运行DATEDIFF()函数时,所有内容都返回Null

    此外,当我将鼠标悬停在任一时间戳字段上时,它会告诉我数据类型是时间戳,但当我尝试运行任何类型的时间戳转换到秒或任何其他值时,下一个ord字段会导致它失败,并出现“类型未知”错误

    我只是在寻找我做错了什么,或者有没有办法解决这个问题


    谢谢你的帮助。

    我认为这与wondow函数如何处理时间戳有关

    这就是我目前看到的情况:

    1. 当源数据点为字符串时,所有数据点均按预期工作:

    SELECT 
      customer_id,
      first_ord,
      next_ord,
      DATEDIFF(next_ord, first_ord) AS diff
    FROM (
      SELECT 
        customer_id, 
        LEAD(ord_time, 0) OVER (PARTITION BY customer_id ORDER BY ord_time) first_ord, 
        LEAD(ord_time, 1) OVER (PARTITION BY customer_id ORDER BY ord_time) next_ord,
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ord_time) num
      FROM 
        (SELECT 1 AS customer_id, '2014-04-08 09:51:24 UTC' AS ord_time),
        (SELECT 1 AS customer_id, '2014-04-08 09:53:31 UTC' AS ord_time),
        (SELECT 1 AS customer_id, '2014-05-08 09:53:31 UTC' AS ord_time),
        (SELECT 2 AS customer_id, '2014-09-12 17:20:43 UTC' AS ord_time),
        (SELECT 2 AS customer_id, '2015-04-16 21:44:18 UTC' AS ord_time),
    )
    WHERE num = 1
    
    结果:

    customer_id       first_ord             next_ord    diff     
    1   2014-04-08 09:51:24 UTC 2014-04-08 09:53:31 UTC 0    
    2   2014-09-12 17:20:43 UTC 2015-04-16 21:44:18 UTC 216  
    
    2. 当源数据点为时间戳时-结果为null,如您在问题中所述:

    SELECT 
      customer_id,
      first_ord,
      next_ord,
      DATEDIFF(next_ord, first_ord) AS diff
    FROM (
      SELECT 
        customer_id, 
        LEAD(ord_time, 0) OVER (PARTITION BY customer_id ORDER BY ord_time) first_ord, 
        LEAD(ord_time, 1) OVER (PARTITION BY customer_id ORDER BY ord_time) next_ord,
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ord_time) num
      FROM 
        (SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:51:24 UTC') AS ord_time),
        (SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:53:31 UTC') AS ord_time),
        (SELECT 1 AS customer_id, TIMESTAMP('2014-05-08 09:53:31 UTC') AS ord_time),
        (SELECT 2 AS customer_id, TIMESTAMP('2014-09-12 17:20:43 UTC') AS ord_time),
        (SELECT 2 AS customer_id, TIMESTAMP('2015-04-16 21:44:18 UTC') AS ord_time),
    )
    WHERE num = 1
    
    结果:

    customer_id       first_ord             next_ord    diff     
    1   2014-04-08 09:51:24 UTC 2014-04-08 09:53:31 UTC 0    
    2   2014-09-12 17:20:43 UTC 2015-04-16 21:44:18 UTC 216  
    
    customer_id       first_ord             next_ord    diff     
    1   2014-04-08 09:51:24 UTC 2014-04-08 09:53:31 UTC null     
    2   2014-09-12 17:20:43 UTC 2015-04-16 21:44:18 UTC null     
    
    3. 为了“修复”,我必须进行如下铸造:

    SELECT 
      customer_id,
      TIMESTAMP(first_ord) as first_ord,
      TIMESTAMP(next_ord) as next_ord,
      DATEDIFF(next_ord, first_ord) AS diff
    FROM (
      SELECT 
        customer_id, 
        LEAD(STRING(ord_time), 0) OVER (PARTITION BY customer_id ORDER BY ord_time) first_ord, 
        LEAD(STRING(ord_time), 1) OVER (PARTITION BY customer_id ORDER BY ord_time) next_ord,
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ord_time) num
      FROM 
        (SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:51:24 UTC') AS ord_time),
        (SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:53:31 UTC') AS ord_time),
        (SELECT 1 AS customer_id, TIMESTAMP('2014-05-08 09:53:31 UTC') AS ord_time),
        (SELECT 2 AS customer_id, TIMESTAMP('2014-09-12 17:20:43 UTC') AS ord_time),
        (SELECT 2 AS customer_id, TIMESTAMP('2015-04-16 21:44:18 UTC') AS ord_time)
    )
    WHERE num = 1
    
    结果是:

    customer_id       first_ord             next_ord    diff     
    1   2014-04-08 09:51:24 UTC 2014-04-08 09:53:31 UTC 0    
    2   2014-09-12 17:20:43 UTC 2015-04-16 21:44:18 UTC 216  
    

    这太棒了。我没想到要在窗口功能内进行任何铸造。我所有的尝试都是在完成计算之后进行的。