Google bigquery BQ:将金额分配到数组中的行上+;要匹配的插件

Google bigquery BQ:将金额分配到数组中的行上+;要匹配的插件,google-bigquery,Google Bigquery,我需要在数组的行上分布一个聚合的数量。我从这里开始: WITH a AS ( SELECT 1 AS key, 8.55 AS tot_tax , ARRAY( SELECT AS STRUCT 'item1' AS descr, 9.6 AS amt UNION ALL SELECT AS STRUCT 'item2', 183.5 UNION ALL SELECT AS STRUCT 'item3'

我需要在数组的行上分布一个聚合的数量。我从这里开始:

WITH a AS (
  SELECT 1 AS key, 8.55 AS tot_tax
    , ARRAY(
        SELECT AS STRUCT 'item1' AS descr, 9.6 AS amt 
        UNION ALL
        SELECT AS STRUCT 'item2', 183.5
        UNION ALL
        SELECT AS STRUCT 'item3', 26.5
        ) items
  )

--query:

 SELECT * EXCEPT(items) 
  , ARRAY(
      SELECT AS STRUCT *
        , ROUND(amt/(SELECT SUM(amt) FROM UNNEST(items)) * tot_tax, 2) AS tax
      FROM UNNEST(items)
      ) items
 FROM a
但是,由于绕行(必需),因此
SUM(tax)
tot_tax
。因此,我想将微小的差异插入最大的税额中,使之匹配。 我可以在另一个查询中这样做:

SELECT * EXCEPT(items)
 , ARRAY(
    SELECT AS STRUCT * EXCEPT(tax)
      , IF(o = 0, ROUND(tax + (SELECT tot_tax - SUM(tax) FROM UNNEST(items)), 2), tax) AS tax
    FROM UNNEST(items) WITH OFFSET o
   ) items
FROM 
  (SELECT * EXCEPT(items) 
    , ARRAY(
       SELECT AS STRUCT *
         , ROUND(amt/(select SUM(amt) FROM UNNEST(items)) * tot_tax, 2) AS tax
       FROM UNNEST(items) ORDER BY amt DESC
      ) items
   FROM a)  
工作很好,但很麻烦。
单次查询或使用UDF(js/SQL)是否可以更好地做到这一点(清晰度+性能)?

下面是针对BigQuery标准SQL的

#standardSQL
SELECT * REPLACE(
  ARRAY(
    SELECT AS STRUCT * EXCEPT(tax, pos),
      IF(ROW_NUMBER() OVER(ORDER BY amt DESC) = 1, 
        ROUND(tax + tot_tax - SUM(tax) OVER(), 2), tax
      ) AS tax
    FROM (
      SELECT * EXCEPT(ratio), ROUND(amt * ratio, 2) AS tax
      FROM UNNEST(items) WITH OFFSET AS pos, 
        (SELECT tot_tax / SUM(amt) AS ratio FROM UNNEST(items))
    )
    ORDER BY pos
  ) AS items)
FROM a
如果要应用于问题中的样本数据,则结果为

Row key tot_tax items.descr items.amt   items.tax    
1   1   8.55    item1       9.6         0.37     
                item2       183.5       7.15     
                item3       26.5        1.03       
下面是进一步简化/重构的版本(但可能不太友好)


输出完全相同

谢谢Mikhail。我喜欢窗口函数而不是不需要的数组-不要忘记。由于记录集非常大,删除了按pos下单以加快速度
#standardSQL
SELECT * REPLACE(
  ARRAY(
    SELECT AS STRUCT  * EXCEPT(ratio, pos), 
      ROUND(ROUND(amt * ratio, 2) + 
        IF(ROW_NUMBER() OVER(ORDER BY amt DESC) = 1, tot_tax - SUM(ROUND(amt * ratio, 2)) OVER(), 0) 
      , 2) AS tax
    FROM UNNEST(items) WITH OFFSET AS pos, 
      (SELECT tot_tax / SUM(amt) AS ratio FROM UNNEST(items))
  ORDER BY pos) AS items)
FROM a