Google bigquery BigQuery标准在分组时获取第一个非空值

Google bigquery BigQuery标准在分组时获取第一个非空值,google-bigquery,coalesce,Google Bigquery,Coalesce,我有一张这样的桌子: CUSTOMERS_ID DATE_SALES DIMENSION MARIO1 20200201 NULL MARIO1 20200113 Spain MARIO2 20200131 NULL MARIO3 20200101 France MARIO3 20191231 Spain 我需要按客户编号和日期销售描述字段进行订购。然后我想按CUSTOMERS\u ID字段

我有一张这样的桌子:

CUSTOMERS_ID  DATE_SALES  DIMENSION
MARIO1        20200201    NULL
MARIO1        20200113    Spain
MARIO2        20200131    NULL
MARIO3        20200101    France
MARIO3        20191231    Spain
我需要按客户编号和日期销售描述字段进行订购。然后我想按CUSTOMERS\u ID字段分组,并获取维度字段的第一个非空值。 输出表将是:

CUSTOMERS_ID  DIMENSION
MARIO1        Spain
MARIO2        NULL
MARIO3        France
有什么想法吗?我尝试过合并函数,第一个值,但没有得到我预期的结果

提前谢谢

我们可以在这里使用行号技巧:

WITH cte AS (
SELECT CUSTOMERS_ID,
       ROW_NUMBER() OVER (PARTITION BY CUSTOMERS_ID
                          ORDER BY -1.0*UNIX_SECONDS(DATE_SALES) DESC) rn
    FROM yourTable
)

SELECT CUSTOMERS_ID, DIMENSION
FROM cte
WHERE rn = 1
ORDER BY CUSTOMERS_ID;
逻辑是将行数按从历元开始的负秒数降序。这会将最近的销售放在第一位,也会将NULL放在最后,因此如果没有非NULL维度数据可用,则NULL值只会收到行号1。

我们可以在此处使用行号技巧:

WITH cte AS (
SELECT CUSTOMERS_ID,
       ROW_NUMBER() OVER (PARTITION BY CUSTOMERS_ID
                          ORDER BY -1.0*UNIX_SECONDS(DATE_SALES) DESC) rn
    FROM yourTable
)

SELECT CUSTOMERS_ID, DIMENSION
FROM cte
WHERE rn = 1
ORDER BY CUSTOMERS_ID;

逻辑是将行数按从历元开始的负秒数降序。这会将最近的销售放在第一位,也会将NULL放在最后,因此,如果没有非NULL维度数据可用,则NULL值只会收到第1行。

您可以按客户id分组,并通过忽略NULL使用ARRAY_AGG,还可以在该字段中按日期订购。 限制1将通过使用更少的RAM存储来提高效率。 然后,OFFSET0将使它成为一个不需要赋值的字段,因此您可以轻松地使用该字段

WITH 
raw_data AS
(
  SELECT 'MARIO1' CUSTOMERS_ID, 20200201 DATE_SALES, NULL as DIMENSION UNION ALL
  SELECT 'MARIO1' CUSTOMERS_ID, 20200113 DATE_SALES, 'Spain' as DIMENSION UNION ALL
  SELECT 'MARIO2' CUSTOMERS_ID, 20200131 DATE_SALES, NULL as DIMENSION UNION ALL
  SELECT 'MARIO3' CUSTOMERS_ID, 20200101 DATE_SALES, 'France' as DIMENSION UNION ALL
  SELECT 'MARIO3' CUSTOMERS_ID, 20191231 DATE_SALES, 'Spain' as DIMENSION
)
SELECT CUSTOMERS_ID, ARRAY_AGG(DIMENSION IGNORE NULLS ORDER BY DATE_SALES DESC LIMIT 1)[OFFSET(0)] as DIMENSION
FROM raw_data
GROUP BY 1

您可以按客户id分组,并通过忽略空值使用数组_AGG,还可以在该字段中按日期订购。 限制1将通过使用更少的RAM存储来提高效率。 然后,OFFSET0将使它成为一个不需要赋值的字段,因此您可以轻松地使用该字段

WITH 
raw_data AS
(
  SELECT 'MARIO1' CUSTOMERS_ID, 20200201 DATE_SALES, NULL as DIMENSION UNION ALL
  SELECT 'MARIO1' CUSTOMERS_ID, 20200113 DATE_SALES, 'Spain' as DIMENSION UNION ALL
  SELECT 'MARIO2' CUSTOMERS_ID, 20200131 DATE_SALES, NULL as DIMENSION UNION ALL
  SELECT 'MARIO3' CUSTOMERS_ID, 20200101 DATE_SALES, 'France' as DIMENSION UNION ALL
  SELECT 'MARIO3' CUSTOMERS_ID, 20191231 DATE_SALES, 'Spain' as DIMENSION
)
SELECT CUSTOMERS_ID, ARRAY_AGG(DIMENSION IGNORE NULLS ORDER BY DATE_SALES DESC LIMIT 1)[OFFSET(0)] as DIMENSION
FROM raw_data
GROUP BY 1

下面是BigQuery标准SQL

#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY IF(DIMENSION IS NULL, NULL, DATE_SALES) DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY CUSTOMERS_ID   
如果要应用于您问题中的样本数据-结果为

Row CUSTOMERS_ID    DATE_SALES  DIMENSION    
1   MARIO1          20200113    Spain    
2   MARIO2          20200131    null     
3   MARIO3          20200101    France   

下面是BigQuery标准SQL

#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY IF(DIMENSION IS NULL, NULL, DATE_SALES) DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY CUSTOMERS_ID   
如果要应用于您问题中的样本数据-结果为

Row CUSTOMERS_ID    DATE_SALES  DIMENSION    
1   MARIO1          20200113    Spain    
2   MARIO2          20200131    null     
3   MARIO3          20200101    France   

ARRAY_AGG和OFFSET是我的新好朋友。谢谢你,米哈伊尔!ARRAY_AGG和OFFSET是我的新好朋友。谢谢你,米哈伊尔!