结合常规函数和窗口函数的Google BigQuery SQL问题

结合常规函数和窗口函数的Google BigQuery SQL问题,sql,google-bigquery,Sql,Google Bigquery,我正在看一个电子商务网站的订单表,并试图建立一个客户表,其中包含每个客户的一些基本信息 当我尝试将窗口函数如NTH_值与普通函数结合使用时,我就被吸引住了 orders表如下所示: order_id | customer_id | order_date | revenue ---------------------------------------------- 1 | 11 | 2017-01-01 | 5.0 2 | 11

我正在看一个电子商务网站的订单表,并试图建立一个客户表,其中包含每个客户的一些基本信息

当我尝试将窗口函数如NTH_值与普通函数结合使用时,我就被吸引住了

orders表如下所示:

order_id | customer_id | order_date | revenue
----------------------------------------------
    1    |      11     | 2017-01-01 |  5.0
    2    |      11     | 2018-02-01 |  2.25
    3    |      12     | 2019-03-01 |  1.0
    4    |      13     | 2016-04-01 |  12.0
    5    |      13     | 2016-05-01 |  15.25
    6    |      13     | 2018-06-01 |  25.25

customer_id | num_orders | first_order_date | first_order_revenue | second_order_date
--------------------------------------------------------------------------------------
      11    |     2      |    2017-01-01    |        5.0          |    2018-02-01
      12    |     1      |    2019-03-01    |        1.0          |        n/a
      13    |     3      |    2016-04-01    |        12.0         |    2018-06-01

SELECT
customer_id,
COUNT(customer_id) num_orders,
MIN(order_date) first_order_date,
FIRST_VALUE(revenue) OVER w1 first_order_revenue,
NTH_VALUE(order_date, 2) OVER w1 second_order_date

FROM `orders`
GROUP BY customer_id
WINDOW w1 as (PARTITION BY customer_id ORDER BY order_date ASC)
我希望构建一个如下所示的Customers表:

order_id | customer_id | order_date | revenue
----------------------------------------------
    1    |      11     | 2017-01-01 |  5.0
    2    |      11     | 2018-02-01 |  2.25
    3    |      12     | 2019-03-01 |  1.0
    4    |      13     | 2016-04-01 |  12.0
    5    |      13     | 2016-05-01 |  15.25
    6    |      13     | 2018-06-01 |  25.25

customer_id | num_orders | first_order_date | first_order_revenue | second_order_date
--------------------------------------------------------------------------------------
      11    |     2      |    2017-01-01    |        5.0          |    2018-02-01
      12    |     1      |    2019-03-01    |        1.0          |        n/a
      13    |     3      |    2016-04-01    |        12.0         |    2018-06-01

SELECT
customer_id,
COUNT(customer_id) num_orders,
MIN(order_date) first_order_date,
FIRST_VALUE(revenue) OVER w1 first_order_revenue,
NTH_VALUE(order_date, 2) OVER w1 second_order_date

FROM `orders`
GROUP BY customer_id
WINDOW w1 as (PARTITION BY customer_id ORDER BY order_date ASC)
我的代码应该是这样的:

order_id | customer_id | order_date | revenue
----------------------------------------------
    1    |      11     | 2017-01-01 |  5.0
    2    |      11     | 2018-02-01 |  2.25
    3    |      12     | 2019-03-01 |  1.0
    4    |      13     | 2016-04-01 |  12.0
    5    |      13     | 2016-05-01 |  15.25
    6    |      13     | 2018-06-01 |  25.25

customer_id | num_orders | first_order_date | first_order_revenue | second_order_date
--------------------------------------------------------------------------------------
      11    |     2      |    2017-01-01    |        5.0          |    2018-02-01
      12    |     1      |    2019-03-01    |        1.0          |        n/a
      13    |     3      |    2016-04-01    |        12.0         |    2018-06-01

SELECT
customer_id,
COUNT(customer_id) num_orders,
MIN(order_date) first_order_date,
FIRST_VALUE(revenue) OVER w1 first_order_revenue,
NTH_VALUE(order_date, 2) OVER w1 second_order_date

FROM `orders`
GROUP BY customer_id
WINDOW w1 as (PARTITION BY customer_id ORDER BY order_date ASC)
但它告诉我,我需要通过如下错误将“收入”和“订单日期”分组:

order_id | customer_id | order_date | revenue
----------------------------------------------
    1    |      11     | 2017-01-01 |  5.0
    2    |      11     | 2018-02-01 |  2.25
    3    |      12     | 2019-03-01 |  1.0
    4    |      13     | 2016-04-01 |  12.0
    5    |      13     | 2016-05-01 |  15.25
    6    |      13     | 2018-06-01 |  25.25

customer_id | num_orders | first_order_date | first_order_revenue | second_order_date
--------------------------------------------------------------------------------------
      11    |     2      |    2017-01-01    |        5.0          |    2018-02-01
      12    |     1      |    2019-03-01    |        1.0          |        n/a
      13    |     3      |    2016-04-01    |        12.0         |    2018-06-01

SELECT
customer_id,
COUNT(customer_id) num_orders,
MIN(order_date) first_order_date,
FIRST_VALUE(revenue) OVER w1 first_order_revenue,
NTH_VALUE(order_date, 2) OVER w1 second_order_date

FROM `orders`
GROUP BY customer_id
WINDOW w1 as (PARTITION BY customer_id ORDER BY order_date ASC)
“选择列表表达式引用在[5:13]处既未分组也未聚合的列收入”

但当我这样做时,它会为每个订单返回一行,其中每个订单的第一个订单日期不同,每个订单的第一个订单收入值相同(正确),第二个订单日期是正确的,但第一行除外……其中为空:

customer_id | num_orders | first_order_date | first_order_revenue | second_order_date
--------------------------------------------------------------------------------------
      13    |      1     |    2016-04-01    |        12.0         |       *null*
      13    |      1     |    2016-05-01    |        12.0         |     2016-05-01
      13    |      1     |    2018-06-01    |        12.0         |     2016-05-01

我正在慢慢地自学SQL,但对于这个具体问题,我找不到任何在线解决方案。我猜窗口函数可能需要一个嵌套的SELECT语句,然后再与非窗口函数连接起来?像这样的?我尝试了几种不同的解决方案,但到目前为止没有任何效果


谢谢你的帮助

我认为子查询和条件聚合可能更简单:

SELECT customer_id, COUNT(*) num_orders,
       MIN(order_date) first_order_date,
       MAX(CASE WHEN seqnum = 1 THEN revenue END) as revenue_1,
       MAX(CASE WHEN seqnum = 2 THEN revenue END) as revenue_2
FROM (SELECT o.*,
             ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) as seqnum
      FROM `orders` o
     ) o
GROUP BY customer_id;
或者,将值放入数组中:

SELECT customer_id, COUNT(*) num_orders,
       MIN(order_date) first_order_date,
       ARRAY_AGG(revenue ORDER BY order_date LIMIT 2) as revenue_1_2
FROM `orders` o
GROUP BY customer_id;

啊,很有意思。使用ROW_NUMBER()创建“seqnum”列,然后如果它与所需的seqnum匹配,则可以提取任何列。Woops它发送得太早了。他还想补充说:使用MAX只是一种解决方法,可以在不必分组的情况下提取这些值,对吗?@LeeLK。对
MAX()。