结合常规函数和窗口函数的Google BigQuery SQL问题
我正在看一个电子商务网站的订单表,并试图建立一个客户表,其中包含每个客户的一些基本信息 当我尝试将窗口函数如NTH_值与普通函数结合使用时,我就被吸引住了 orders表如下所示:结合常规函数和窗口函数的Google BigQuery SQL问题,sql,google-bigquery,Sql,Google Bigquery,我正在看一个电子商务网站的订单表,并试图建立一个客户表,其中包含每个客户的一些基本信息 当我尝试将窗口函数如NTH_值与普通函数结合使用时,我就被吸引住了 orders表如下所示: order_id | customer_id | order_date | revenue ---------------------------------------------- 1 | 11 | 2017-01-01 | 5.0 2 | 11
order_id | customer_id | order_date | revenue
----------------------------------------------
1 | 11 | 2017-01-01 | 5.0
2 | 11 | 2018-02-01 | 2.25
3 | 12 | 2019-03-01 | 1.0
4 | 13 | 2016-04-01 | 12.0
5 | 13 | 2016-05-01 | 15.25
6 | 13 | 2018-06-01 | 25.25
customer_id | num_orders | first_order_date | first_order_revenue | second_order_date
--------------------------------------------------------------------------------------
11 | 2 | 2017-01-01 | 5.0 | 2018-02-01
12 | 1 | 2019-03-01 | 1.0 | n/a
13 | 3 | 2016-04-01 | 12.0 | 2018-06-01
SELECT
customer_id,
COUNT(customer_id) num_orders,
MIN(order_date) first_order_date,
FIRST_VALUE(revenue) OVER w1 first_order_revenue,
NTH_VALUE(order_date, 2) OVER w1 second_order_date
FROM `orders`
GROUP BY customer_id
WINDOW w1 as (PARTITION BY customer_id ORDER BY order_date ASC)
我希望构建一个如下所示的Customers表:
order_id | customer_id | order_date | revenue
----------------------------------------------
1 | 11 | 2017-01-01 | 5.0
2 | 11 | 2018-02-01 | 2.25
3 | 12 | 2019-03-01 | 1.0
4 | 13 | 2016-04-01 | 12.0
5 | 13 | 2016-05-01 | 15.25
6 | 13 | 2018-06-01 | 25.25
customer_id | num_orders | first_order_date | first_order_revenue | second_order_date
--------------------------------------------------------------------------------------
11 | 2 | 2017-01-01 | 5.0 | 2018-02-01
12 | 1 | 2019-03-01 | 1.0 | n/a
13 | 3 | 2016-04-01 | 12.0 | 2018-06-01
SELECT
customer_id,
COUNT(customer_id) num_orders,
MIN(order_date) first_order_date,
FIRST_VALUE(revenue) OVER w1 first_order_revenue,
NTH_VALUE(order_date, 2) OVER w1 second_order_date
FROM `orders`
GROUP BY customer_id
WINDOW w1 as (PARTITION BY customer_id ORDER BY order_date ASC)
我的代码应该是这样的:
order_id | customer_id | order_date | revenue
----------------------------------------------
1 | 11 | 2017-01-01 | 5.0
2 | 11 | 2018-02-01 | 2.25
3 | 12 | 2019-03-01 | 1.0
4 | 13 | 2016-04-01 | 12.0
5 | 13 | 2016-05-01 | 15.25
6 | 13 | 2018-06-01 | 25.25
customer_id | num_orders | first_order_date | first_order_revenue | second_order_date
--------------------------------------------------------------------------------------
11 | 2 | 2017-01-01 | 5.0 | 2018-02-01
12 | 1 | 2019-03-01 | 1.0 | n/a
13 | 3 | 2016-04-01 | 12.0 | 2018-06-01
SELECT
customer_id,
COUNT(customer_id) num_orders,
MIN(order_date) first_order_date,
FIRST_VALUE(revenue) OVER w1 first_order_revenue,
NTH_VALUE(order_date, 2) OVER w1 second_order_date
FROM `orders`
GROUP BY customer_id
WINDOW w1 as (PARTITION BY customer_id ORDER BY order_date ASC)
但它告诉我,我需要通过如下错误将“收入”和“订单日期”分组:
order_id | customer_id | order_date | revenue
----------------------------------------------
1 | 11 | 2017-01-01 | 5.0
2 | 11 | 2018-02-01 | 2.25
3 | 12 | 2019-03-01 | 1.0
4 | 13 | 2016-04-01 | 12.0
5 | 13 | 2016-05-01 | 15.25
6 | 13 | 2018-06-01 | 25.25
customer_id | num_orders | first_order_date | first_order_revenue | second_order_date
--------------------------------------------------------------------------------------
11 | 2 | 2017-01-01 | 5.0 | 2018-02-01
12 | 1 | 2019-03-01 | 1.0 | n/a
13 | 3 | 2016-04-01 | 12.0 | 2018-06-01
SELECT
customer_id,
COUNT(customer_id) num_orders,
MIN(order_date) first_order_date,
FIRST_VALUE(revenue) OVER w1 first_order_revenue,
NTH_VALUE(order_date, 2) OVER w1 second_order_date
FROM `orders`
GROUP BY customer_id
WINDOW w1 as (PARTITION BY customer_id ORDER BY order_date ASC)
“选择列表表达式引用在[5:13]处既未分组也未聚合的列收入”
但当我这样做时,它会为每个订单返回一行,其中每个订单的第一个订单日期不同,每个订单的第一个订单收入值相同(正确),第二个订单日期是正确的,但第一行除外……其中为空:
customer_id | num_orders | first_order_date | first_order_revenue | second_order_date
--------------------------------------------------------------------------------------
13 | 1 | 2016-04-01 | 12.0 | *null*
13 | 1 | 2016-05-01 | 12.0 | 2016-05-01
13 | 1 | 2018-06-01 | 12.0 | 2016-05-01
我正在慢慢地自学SQL,但对于这个具体问题,我找不到任何在线解决方案。我猜窗口函数可能需要一个嵌套的SELECT语句,然后再与非窗口函数连接起来?像这样的?我尝试了几种不同的解决方案,但到目前为止没有任何效果
谢谢你的帮助 我认为子查询和条件聚合可能更简单:
SELECT customer_id, COUNT(*) num_orders,
MIN(order_date) first_order_date,
MAX(CASE WHEN seqnum = 1 THEN revenue END) as revenue_1,
MAX(CASE WHEN seqnum = 2 THEN revenue END) as revenue_2
FROM (SELECT o.*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) as seqnum
FROM `orders` o
) o
GROUP BY customer_id;
或者,将值放入数组中:
SELECT customer_id, COUNT(*) num_orders,
MIN(order_date) first_order_date,
ARRAY_AGG(revenue ORDER BY order_date LIMIT 2) as revenue_1_2
FROM `orders` o
GROUP BY customer_id;
啊,很有意思。使用ROW_NUMBER()创建“seqnum”列,然后如果它与所需的seqnum匹配,则可以提取任何列。Woops它发送得太早了。他还想补充说:使用MAX只是一种解决方法,可以在不必分组的情况下提取这些值,对吗?@LeeLK。对
MAX()。