Sql 多层次结构上的数组聚合

Sql 多层次结构上的数组聚合,sql,google-bigquery,Sql,Google Bigquery,考虑以下相当标准的非规范化事务信息模型: with transactions as( select 'T_10000' as trans_id, 'L_1000' as line_item_id, 'P_100' as part_id union all select 'T_10000', 'L_1000', 'P_101' union all select 'T_10000', 'L_1001', 'P_103' union all select 'T_1000

考虑以下相当标准的非规范化事务信息模型:

with transactions as(
  select 'T_10000' as trans_id, 'L_1000' as line_item_id, 'P_100' as part_id
  union all 
  select 'T_10000', 'L_1000', 'P_101'
  union all
  select 'T_10000', 'L_1001', 'P_103'
  union all 
  select 'T_10001', 'L_1002', 'P_104'
)

我想进一步反规范化这个表,以消除所有重复的值。BigQuery中的数组似乎是一个很好的选择

下表已关闭,但仍在第二列中返回重复值

select trans_id, array_agg(line_item_id), array_agg(part_id)
from transactions
group by 1

此外,下面的内容也很接近,但现在在第一列中包含重复的值

select trans_id, line_item_id, array_agg(part_id)
from transactions
group by 1, 2

有没有一种直接的方法可以做到这一点?

这就是你想要的吗

select trans_id, array_agg(distinct line_item_id), array_agg(part_id) as parts
from transactions t
group by trans_id;
这是你想要的吗

select trans_id, array_agg(distinct line_item_id), array_agg(part_id) as parts
from transactions t
group by trans_id;

下面是BigQuery标准SQL

#standardSQL 
SELECT trans_id,
  ARRAY_AGG(STRUCT(line_item_id, parts)) items
FROM (
  SELECT trans_id, 
    line_item_id, 
    ARRAY_AGG(part_id) parts
  FROM transactions
  GROUP BY trans_id, line_item_id
)
GROUP BY trans_id   
当应用于问题中的样本数据时-结果为


下面是BigQuery标准SQL

#standardSQL 
SELECT trans_id,
  ARRAY_AGG(STRUCT(line_item_id, parts)) items
FROM (
  SELECT trans_id, 
    line_item_id, 
    ARRAY_AGG(part_id) parts
  FROM transactions
  GROUP BY trans_id, line_item_id
)
GROUP BY trans_id   
当应用于问题中的样本数据时-结果为